Bushy plan execution

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bushy plan execution

Mathias Eriksen Otkjær

Hi


We have attempted to create a bushy logical plan for Flink, but we are not certain whether it actually gets executed in parallel or in a linear fashion inside Flink (we are certain that it works, as we get the same results now as we did using SQL in the table API). How can we confirm that such a plan does indeed get executed in the parallel fashion that we expect and on different taskmanagers? Should we just trust the framework to execute our query plan with operations running in parallel, or should we somehow manually set the parallelism of the individual join operations inside flink?


Thanks,

Jesper and Mathias

Reply | Threaded
Open this post in threaded view
|

Re: Bushy plan execution

Fabian Hueske-2
Hi,

Flink does not apply join order optimization (neither in the DataSet nor in the Table API). Joins are executed in the same order as they are specified.
You can build bushy join plans for SQL by nesting queries:

SELECT *
 FROM (SELECT * FROM X, Y WHERE x = y) AS t1, (SELECT * FROM U, V WHERE u = v) AS t2)
 WHERE t1.a = t2.b

You can check the execution plan returned by ExecutionEnvironment.getExecutionPlan() and paste it into the web visualizer [1].
The TableEnvironment has its own explain command BatchTableEnvironment.explain(Table) or StreamTableEnvironment.explain(Table).

Let me know if you have more questions,
Fabian

2017-05-17 10:54 GMT+02:00 Mathias Eriksen Otkjær <[hidden email]>:

Hi


We have attempted to create a bushy logical plan for Flink, but we are not certain whether it actually gets executed in parallel or in a linear fashion inside Flink (we are certain that it works, as we get the same results now as we did using SQL in the table API). How can we confirm that such a plan does indeed get executed in the parallel fashion that we expect and on different taskmanagers? Should we just trust the framework to execute our query plan with operations running in parallel, or should we somehow manually set the parallelism of the individual join operations inside flink?


Thanks,

Jesper and Mathias