Reducing runtime of Flink planner

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Reducing runtime of Flink planner

Niklas Teichmann
Hi everybody,

I have a question concerning the planner for the Flink Table / Batch API.
At the moment I try to use a library called Cypher for Apache Flink, a  
project that tries to implement
the graph database query language Cypher on Apache Flink (CAPF,  
https://github.com/soerenreichardt/cypher-for-apache-flink).

The problem is that the planner seemingly takes a very long time to  
plan and optimize the job created by CAPF. This example job in json  
format

https://pastebin.com/J84grsjc

takes on a 24 GB data set about 20 minutes to plan and about 5 minutes  
to run the job. That seems very long for a job of this size.

Do you have any idea why this is the case?
Is there a way to give the planner hints to reduce the planning time?

Thanks in advance!
Niklas
--



Reply | Threaded
Open this post in threaded view
|

Re: Reducing runtime of Flink planner

Timo Walther
Hi Niklas,

it would be interesting to know which planner caused the long runtime.
Could you use a debugger to figure out more details? Is it really the
Flink Table API planner or the under DataSet planner one level deeper?

There was an issue that was recently closed [1] about the DataSet
optimizer. Could this solve your problem?

I will also loop in Fabian who might knows more.

Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-10566

Am 07.01.19 um 14:05 schrieb Niklas Teichmann:

> Hi everybody,
>
> I have a question concerning the planner for the Flink Table / Batch API.
> At the moment I try to use a library called Cypher for Apache Flink, a
> project that tries to implement
> the graph database query language Cypher on Apache Flink (CAPF,
> https://github.com/soerenreichardt/cypher-for-apache-flink).
>
> The problem is that the planner seemingly takes a very long time to
> plan and optimize the job created by CAPF. This example job in json
> format
>
> https://pastebin.com/J84grsjc
>
> takes on a 24 GB data set about 20 minutes to plan and about 5 minutes
> to run the job. That seems very long for a job of this size.
>
> Do you have any idea why this is the case?
> Is there a way to give the planner hints to reduce the planning time?
>
> Thanks in advance!
> Niklas


Reply | Threaded
Open this post in threaded view
|

Re: Reducing runtime of Flink planner

Fabian Hueske
Hi Niklas,

The planning time of a job does not depend on the data size.
It would be the same whether you process 5MB or 5PB.

FLINK-10566 (as pointed to by Timo) fixed a problem for plans with many braching and joining nodes.
Looking at your plan, there are some, but (IMO) not enough to be problematic.

Your idea of providing hints is very good.
You can do that for joins with size hints [1] or strategy hints [2].

Best, Fabian



Am Mo., 7. Jan. 2019 um 17:36 Uhr schrieb Timo Walther <[hidden email]>:
Hi Niklas,

it would be interesting to know which planner caused the long runtime.
Could you use a debugger to figure out more details? Is it really the
Flink Table API planner or the under DataSet planner one level deeper?

There was an issue that was recently closed [1] about the DataSet
optimizer. Could this solve your problem?

I will also loop in Fabian who might knows more.

Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-10566

Am 07.01.19 um 14:05 schrieb Niklas Teichmann:
> Hi everybody,
>
> I have a question concerning the planner for the Flink Table / Batch API.
> At the moment I try to use a library called Cypher for Apache Flink, a
> project that tries to implement
> the graph database query language Cypher on Apache Flink (CAPF,
> https://github.com/soerenreichardt/cypher-for-apache-flink).
>
> The problem is that the planner seemingly takes a very long time to
> plan and optimize the job created by CAPF. This example job in json
> format
>
> https://pastebin.com/J84grsjc
>
> takes on a 24 GB data set about 20 minutes to plan and about 5 minutes
> to run the job. That seems very long for a job of this size.
>
> Do you have any idea why this is the case?
> Is there a way to give the planner hints to reduce the planning time?
>
> Thanks in advance!
> Niklas


Reply | Threaded
Open this post in threaded view
|

Re: Reducing runtime of Flink planner

Niklas Teichmann
Hi Fabian and Timo,

Thanks for your answers! At the moment we're working at updating our  
project to Flink 1.7, so that we can check if the commit you wrote  
about solves the problem. The debugging we did so far seems to point  
to calcite as being responsible for the long planning times - we're  
still experimenting though.  As soon as I have new information, I will  
share it with you.

Kind Regards,
Niklas

Quoting Fabian Hueske <[hidden email]>:

> Hi Niklas,
>
> The planning time of a job does not depend on the data size.
> It would be the same whether you process 5MB or 5PB.
>
> FLINK-10566 (as pointed to by Timo) fixed a problem for plans with many
> braching and joining nodes.
> Looking at your plan, there are some, but (IMO) not enough to be
> problematic.
>
> Your idea of providing hints is very good.
> You can do that for joins with size hints [1] or strategy hints [2].
>
> Best, Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/batch/dataset_transformations.html#join-with-dataset-size-hint
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/batch/dataset_transformations.html#join-algorithm-hints
>
>
> Am Mo., 7. Jan. 2019 um 17:36 Uhr schrieb Timo Walther <[hidden email]>:
>
>> Hi Niklas,
>>
>> it would be interesting to know which planner caused the long runtime.
>> Could you use a debugger to figure out more details? Is it really the
>> Flink Table API planner or the under DataSet planner one level deeper?
>>
>> There was an issue that was recently closed [1] about the DataSet
>> optimizer. Could this solve your problem?
>>
>> I will also loop in Fabian who might knows more.
>>
>> Regards,
>> Timo
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-10566
>>
>> Am 07.01.19 um 14:05 schrieb Niklas Teichmann:
>> > Hi everybody,
>> >
>> > I have a question concerning the planner for the Flink Table / Batch API.
>> > At the moment I try to use a library called Cypher for Apache Flink, a
>> > project that tries to implement
>> > the graph database query language Cypher on Apache Flink (CAPF,
>> > https://github.com/soerenreichardt/cypher-for-apache-flink).
>> >
>> > The problem is that the planner seemingly takes a very long time to
>> > plan and optimize the job created by CAPF. This example job in json
>> > format
>> >
>> > https://pastebin.com/J84grsjc
>> >
>> > takes on a 24 GB data set about 20 minutes to plan and about 5 minutes
>> > to run the job. That seems very long for a job of this size.
>> >
>> > Do you have any idea why this is the case?
>> > Is there a way to give the planner hints to reduce the planning time?
>> >
>> > Thanks in advance!
>> > Niklas
>>
>>
>>


--