CodeCache is full - Issues with job deployments

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CodeCache is full - Issues with job deployments

PedroMrChaves
Hello,

Every time I deploy a flink job the code cache increases, which is expected.
However, when I stop and start the job or it restarts the code cache
continuous to increase.

Screenshot_2018-12-11_at_11.png
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/Screenshot_2018-12-11_at_11.png>  


I've added the flags "-XX:+PrintCompilation -XX:ReservedCodeCacheSize=350m
-XX:-UseCodeCacheFlushing" to Flink taskmanagers and jobmanagers, but the
cache doesn't decrease very much, as it is depicted in the screenshot above.
Even if I stop all the jobs, the cache doesn't decrease.

This gets to a point where I get the error "CodeCache is full. Compiler has
been disabled".

I've attached the taskmanagers output with the "XX:+PrintCompilation" flag
activated.

flink-flink-taskexecutor.out
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/flink-flink-taskexecutor.out>  

Flink: 1.6.2
Java:  openjdk version "1.8.0_191"

Best Regards,
Pedro Chaves.




-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves
Reply | Threaded
Open this post in threaded view
|

Re: CodeCache is full - Issues with job deployments

Stefan Richter
Hi,

in general, Flink uses user-code class loader for job specific code and the lifecycle of the class loader should end with the job. This usually means that job related code could be removed after the job is finished. However, objects of a class that was loaded by the user-code class loader should no longer be referenced from anywhere after the job finished or else the user-code class loader cannot be freed. If that is the case depends on the user code and the used dependencies, e.g. the user code might register some objects somewhere and does not remove them by the end of the job. This would prevent freeing the user-code and result in a leak. To figure out the root cause, you can take can analyse a heap dump for leaking class loaders, e.g. [1] and other sources on the web go deeper into this topic.

Best,
Stefan


On 11. Dec 2018, at 12:56, PedroMrChaves <[hidden email]> wrote:

Hello,

Every time I deploy a flink job the code cache increases, which is expected.
However, when I stop and start the job or it restarts the code cache
continuous to increase.

Screenshot_2018-12-11_at_11.png
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/Screenshot_2018-12-11_at_11.png>  


I've added the flags "-XX:+PrintCompilation -XX:ReservedCodeCacheSize=350m
-XX:-UseCodeCacheFlushing" to Flink taskmanagers and jobmanagers, but the
cache doesn't decrease very much, as it is depicted in the screenshot above.
Even if I stop all the jobs, the cache doesn't decrease.

This gets to a point where I get the error "CodeCache is full. Compiler has
been disabled".

I've attached the taskmanagers output with the "XX:+PrintCompilation" flag
activated.

flink-flink-taskexecutor.out
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/flink-flink-taskexecutor.out>  

Flink: 1.6.2
Java:  openjdk version "1.8.0_191"

Best Regards,
Pedro Chaves.




-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: CodeCache is full - Issues with job deployments

PedroMrChaves
Hello Stefan,

Thank you for the reply.

I've taken a heap dump from a development cluster using jmap and analysed
it. To do the tests we restarted the cluster and then left a job running for
a few minutes. After that, we restarted the job a couple of times and
stopped it. After leaving the cluster with no running jobs for 20 min we
toke a heap dump.

We've found out that a thread which consumes data from kafka was still
running with a lot of finalizer calls as depicted bellow.


<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/Screenshot_2018-12-11_at_17.png>

I will deploy a job without a Kafka consumer to see if the code cache still
increases  (all of our cluster have problems with the code cache,
coincidentally all of the deployed jobs read from kafka).


Best Regards,
Pedro Chaves



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves
Reply | Threaded
Open this post in threaded view
|

Re: CodeCache is full - Issues with job deployments

Stefan Richter
Hi,

Thanks for analyzing the problem. If it turns out that there is a problem with the termination of the Kafka sources, could you please open an issue for that with your results?

Best,
Stefan

> On 11. Dec 2018, at 19:04, PedroMrChaves <[hidden email]> wrote:
>
> Hello Stefan,
>
> Thank you for the reply.
>
> I've taken a heap dump from a development cluster using jmap and analysed
> it. To do the tests we restarted the cluster and then left a job running for
> a few minutes. After that, we restarted the job a couple of times and
> stopped it. After leaving the cluster with no running jobs for 20 min we
> toke a heap dump.
>
> We've found out that a thread which consumes data from kafka was still
> running with a lot of finalizer calls as depicted bellow.
>
>
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/Screenshot_2018-12-11_at_17.png>
>
> I will deploy a job without a Kafka consumer to see if the code cache still
> increases  (all of our cluster have problems with the code cache,
> coincidentally all of the deployed jobs read from kafka).
>
>
> Best Regards,
> Pedro Chaves
>
>
>
> -----
> Best Regards,
> Pedro Chaves
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/