Hi!
My goal is to better understand how my code impacts streaming throughput. I have a streaming job where I join multiple tables (A, B, C, D) using interval joins. Case 1) If I have 3 joins in the same query, I don't hit back pressure.
Case 2) If I create temporary views for two of the joins (for reuse with another query), I hit back a lot of back pressure. This is selecting slightly more fields than the first. CREATE TEMPORARY VIEW `AB`
CREATE TEMPORARY VIEW `ABC` Can Temporary Views increase back pressure? If A, B, C and D are roughly the same size (fake data), does the join order matter? E.g. I assume reducing the size of the columns in each join stage would help. Thanks! - Dan |
When I use DataStream and implement the join myself, I can get 50x the throughput. I assume I'm doing something wrong with Flink's Table API and SQL interface. On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <[hidden email]> wrote:
|
Hi Dan,
could you share the plan with us using `TableEnvironment.explainSql()` for both queries? In general, views should not have an impact on the performance. They are a logical concept that gives a bunch of operations a name. The contained operations are inlined into the bigger query during optimization. Unless you execute multiple queries in a StatementSet, the the data is read twice from the source. How do you execute the SQL stataments? For deterministic plans, join reordering is disabled by default. You can set it via: org.apache.flink.table.api.config.OptimizerConfigOptions#TABLE_OPTIMIZER_JOIN_REORDER_ENABLED Regards, Timo On 23.09.20 08:23, Dan Hill wrote: > When I use DataStream and implement the join myself, I can get 50x the > throughput. I assume I'm doing something wrong with Flink's Table API > and SQL interface. > > On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <[hidden email] > <mailto:[hidden email]>> wrote: > > Hi! > > My goal is to better understand how my code impacts streaming > throughput. > > I have a streaming job where I join multiple tables (A, B, C, D) > using interval joins. > > Case 1) If I have 3 joins in the same query, I don't hit back pressure. > > SELECT ... > FROM A > LEFT JOIN B > ON... > LEFT JOIN C > ON... > LEFT JOIN D > ON... > > > Case 2) If I create temporary views for two of the joins (for reuse > with another query), I hit back a lot of back pressure. This is > selecting slightly more fields than the first. > > CREATE TEMPORARY VIEW `AB` > > SELECT ... > FROM A > LEFT JOIN B > ... > > CREATE TEMPORARY VIEW `ABC` > SELECT ... > FROM AB > LEFT JOIN C > ... > > > > Can Temporary Views increase back pressure? > > If A, B, C and D are roughly the same size (fake data), does the > join order matter? E.g. I assume reducing the size of the columns > in each join stage would help. > > Thanks! > - Dan > > |
I can't reproduce the issue now. I'm using the same commits so I'm guessing another environment factor is at play. The explains are pretty similar (there's an extra LogicalProject when there is an extra view). On Fri, Sep 25, 2020 at 12:55 AM Timo Walther <[hidden email]> wrote: Hi Dan, |
Free forum by Nabble | Edit this page |