Flink Performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Performance

Dharani Sudharsan
Hi All,

Currently, I’m running a flink streaming application, the configuration below.

Task slots: 45
Task Managers: 3
Job Manager: 1
Cpu : 20 per machine

My sample code below:

Process Stream:  datastream.flatmap().map().process().addsink

Data size: 330GB approx.

Raw Stream: datastream.keyby.window.addsink

When I run the raw stream, Kafka source is reading data in GB and it is able to read 330GB in 15m.

But when I run the Process stream, there is a back pressure noticed and source is reading data in MBs and there is a huge impact on the performance.

I’m using file state backend with checkpointing enabled.

I tried debugging the issues. I made some changes to the code like below.

Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink

This time, the performance was slightly improved but not good and I noticed memory leaks which causing Task managers to go down and job is getting terminated.


Any help would be much appreciated.

Thanks,
Dharani.





Reply | Threaded
Open this post in threaded view
|

Re: Flink Performance

David Magalhães
I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy.

On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan <[hidden email]> wrote:
Hi All,

Currently, I’m running a flink streaming application, the configuration below.

Task slots: 45
Task Managers: 3
Job Manager: 1
Cpu     : 20 per machine

My sample code below:

Process Stream:  datastream.flatmap().map().process().addsink

Data size: 330GB approx.

Raw Stream: datastream.keyby.window.addsink

When I run the raw stream, Kafka source is reading data in GB and it is able to read 330GB in 15m.

But when I run the Process stream, there is a back pressure noticed and source is reading data in MBs and there is a huge impact on the performance.

I’m using file state backend with checkpointing enabled.

I tried debugging the issues. I made some changes to the code like below.

Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink

This time, the performance was slightly improved but not good and I noticed memory leaks which causing Task managers to go down and job is getting terminated.


Any help would be much appreciated.

Thanks,
Dharani.





Reply | Threaded
Open this post in threaded view
|

Re: Flink Performance

Dharani Sudharsan
Thanks David.  

But I don’t see any solutions provided for the same. 

On Jan 21, 2020, at 7:13 PM, David Magalhães <[hidden email]> wrote:

I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy.

On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan <[hidden email]> wrote:
Hi All,

Currently, I’m running a flink streaming application, the configuration below.

Task slots: 45
Task Managers: 3
Job Manager: 1
Cpu     : 20 per machine

My sample code below:

Process Stream:  datastream.flatmap().map().process().addsink

Data size: 330GB approx.

Raw Stream: datastream.keyby.window.addsink

When I run the raw stream, Kafka source is reading data in GB and it is able to read 330GB in 15m.

But when I run the Process stream, there is a back pressure noticed and source is reading data in MBs and there is a huge impact on the performance.

I’m using file state backend with checkpointing enabled.

I tried debugging the issues. I made some changes to the code like below.

Datastream.keyby.timewindow.reduce.flatmap.keyby.timewindow.reduce.map.keyby.process.addsink

This time, the performance was slightly improved but not good and I noticed memory leaks which causing Task managers to go down and job is getting terminated.


Any help would be much appreciated.

Thanks,
Dharani.