JobManager container is running beyond physical memory limits

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

JobManager container is running beyond physical memory limits

eSKa
This post was updated on .
Hello,
after switching from 1.4.2. to 1.5.2 we started to have problems with JM container.
Our use case is as follows:
- we get request from user
- run DataProcessing job
- once finished we store details to DB

We have ~1000 jobs per day. After version update our container is dying after ~1-2 days. Previously it was running weeks without a problem.
we reduced our web.history from 100 to 32, but that didnt help a lot.
Do you have any suggestions what we could do? JM has 4GB memory assigned. We will test today with increasing it to 5 or 6GB but i have a feeling that it will only delay moment of crash -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: JobManager container is running beyond physical memory limits

eSKa
Reply | Threaded
Open this post in threaded view
|

Re: JobManager container is running beyond physical memory limits

Yun Tang
Hi

If your JM's container is killed by YARN due to beyond physical memory limit and your job's code is not changed but just bumped the Flink verion , I think you could use jmap command to dump the memory of your JobManager to see the difference between 1.4.2 and 1.5.2, and you could also open the GC log to see the difference.

In my experience, "jobmanager.execution.attempts-history-size" is configured too large, operator's name to too long for metrics stored in JM or even not configured checkpoint path (always return ByteStreamStateHandle back to JM) would impact the JM's memory footprint.

Best
Yun

From: eSKa <[hidden email]>
Sent: Tuesday, September 25, 2018 14:45
To: [hidden email]
Subject: Re: JobManager container is running beyond physical memory limits
 
Reply | Threaded
Open this post in threaded view
|

Re: JobManager container is running beyond physical memory limits

eSKa
we dont set it up anywhere so i guess its default 16. Do you think its too
much?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: JobManager container is running beyond physical memory limits

Till Rohrmann
Hi,

what changed between version 1.4.2 and 1.5.2 was the addition of the application level flow control mechanism which changed a bit how the network buffers are configured. This could be a potential culprit.

Since you said that the container ran for some time, I'm wondering whether there is somewhere a resource leak. In order to debug this, the heap dump would be tremendously helpful.

Cheers,
Till

On Tue, Sep 25, 2018 at 11:27 AM eSKa <[hidden email]> wrote:
we dont set it up anywhere so i guess its default 16. Do you think its too
much?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/