Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

bat man
Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR. 
The job does operations like 

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink 

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly. 

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -


Screenshot 2021-05-25 at 8.42.16 PM.png

Screenshot below for 2 of the task managers -
Screenshot 2021-05-25 at 8.16.23 PM.png
Screenshot 2021-05-25 at 8.16.50 PM.png
 
Thanks,
Hemant
Reply | Threaded
Open this post in threaded view
|

Re: Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

bat man
Any pointers on this.

Thanks.

On Tue, May 25, 2021 at 8:44 PM bat man <[hidden email]> wrote:
Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR. 
The job does operations like 

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink 

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly. 

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -


Screenshot 2021-05-25 at 8.42.16 PM.png

Screenshot below for 2 of the task managers -
Screenshot 2021-05-25 at 8.16.23 PM.png
Screenshot 2021-05-25 at 8.16.50 PM.png
 
Thanks,
Hemant
Reply | Threaded
Open this post in threaded view
|

Re: Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

Dawid Wysakowicz-2

Hi,

It's rather hard to guess what could be the reason. Given that the checkpoint size does not increase I'd assume it should be some data you keep somewhere in your KeyedProcessFunction.

Best,

Dawid

On 26/05/2021 09:24, bat man wrote:
Any pointers on this.

Thanks.

On Tue, May 25, 2021 at 8:44 PM bat man <[hidden email]> wrote:
Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR. 
The job does operations like 

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink 

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly. 

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -


Screenshot 2021-05-25 at 8.42.16 PM.png

Screenshot below for 2 of the task managers -
Screenshot 2021-05-25 at 8.16.23 PM.png
Screenshot 2021-05-25 at 8.16.50 PM.png
 
Thanks,
Hemant

OpenPGP_signature (855 bytes) Download Attachment