(DEPRECATED) Apache Flink User Mailing List archive.

Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

Classic

List

Threaded

3 messages Options

bat man

Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR.

The job does operations like

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly.

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -

Screenshot below for 2 of the task managers -

Thanks,

Hemant

bat man

Re: Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

Any pointers on this.

Thanks.

On Tue, May 25, 2021 at 8:44 PM bat man <[hidden email]> wrote:

Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR.
The job does operations like

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly.

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -

Screenshot below for 2 of the task managers -

Thanks,
Hemant

Dawid Wysakowicz-2

Re: Heap Usage increase gradually in Task Manager - Flink 1.9 on EMR

Hi,

It's rather hard to guess what could be the reason. Given that the checkpoint size does not increase I'd assume it should be some data you keep somewhere in your KeyedProcessFunction.

Best,

Dawid

On 26/05/2021 09:24, bat man wrote:

Any pointers on this.

Thanks.

On Tue, May 25, 2021 at 8:44 PM bat man <[hidden email]> wrote:

Hi Flink Users,

I am running Flink 1.9 stream job on yarn - AWS EMR.

The job does operations like

stream1 - stream 2 join -> stream 3

stream 3 - stream 4 join -> sink

So basically stream1 is fast moving data, stream 2 & 4 are less frequent data. I am using KeyedProcessFunction to do the joins. We have other operations also like dynamic keying the data from stream 1 based on event type, filtering the data if the mandatory data is not available.

Job uses FsBackend on s3 for checkpointing. I am observing the heap usage grows over time.It is not growing that fast but gradual increase is observed. However in the same time the total checkpoint size has not increased significantly.

What could be cause for this. I understand heap dump can help but then the increase is over huge time difference what are the things that can be checked. Any pointers.

Checkpoint size -

Screenshot below for 2 of the task managers -

Thanks,

Hemant

OpenPGP_signature (855 bytes) Download Attachment