Question about Flink optimizer on Stream API

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about Flink optimizer on Stream API

Felipe Gutierrez
Hi,

I was reading some FLIP documents related to the new design of the Flink Schedule [1] and unification of batch and stream [2]. Then I created two different programs to learn how Flink optimizes the Query Plan in Batch and in Stream mode (and how much further it goes). One using batch [3] and one using Stream [4]. During the code debugging and also as it is depicted on the document [2], the batch program uses the org.apache.flink.optimizer.Optimizer class which generates a "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses the "org.apache.flink.streaming.api.graph.StreamGraph" and every transformation inside the packet "org.apache.flink.streaming.api.transformations".

When I am showing the execution plan with "env.getExecutionPlan()" I see exactly I have written on the Flink program (which it is expected). However, I was looking for where I can see the optimized plan. I mean decisions of operators reordering based on cost or statistics. For batch I could find the "org.apache.flink.optimizer.costs.CostEstimator" and "org.apache.flink.optimizer.DataStatistics". But for Stream I only found the creation of the plan. How can I debug that? Or have a better understanding of what Flink is doing. Do you advise me to read some other reference about this?

Kind Regards,
Felipe


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez
Reply | Threaded
Open this post in threaded view
|

Re: Question about Flink optimizer on Stream API

Till Rohrmann
Hi Felipe,

for streaming Flink currently does not optimize the data flow graph. I think the best reference is actually going through the code as you've done for the batch case.

Cheers,
Till

On Wed, Dec 19, 2018 at 3:14 PM Felipe Gutierrez <[hidden email]> wrote:
Hi,

I was reading some FLIP documents related to the new design of the Flink Schedule [1] and unification of batch and stream [2]. Then I created two different programs to learn how Flink optimizes the Query Plan in Batch and in Stream mode (and how much further it goes). One using batch [3] and one using Stream [4]. During the code debugging and also as it is depicted on the document [2], the batch program uses the org.apache.flink.optimizer.Optimizer class which generates a "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses the "org.apache.flink.streaming.api.graph.StreamGraph" and every transformation inside the packet "org.apache.flink.streaming.api.transformations".

When I am showing the execution plan with "env.getExecutionPlan()" I see exactly I have written on the Flink program (which it is expected). However, I was looking for where I can see the optimized plan. I mean decisions of operators reordering based on cost or statistics. For batch I could find the "org.apache.flink.optimizer.costs.CostEstimator" and "org.apache.flink.optimizer.DataStatistics". But for Stream I only found the creation of the plan. How can I debug that? Or have a better understanding of what Flink is doing. Do you advise me to read some other reference about this?

Kind Regards,
Felipe


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez