Re: Task Manager was lost/killed due to full GC

Posted by Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Task-Manager-was-lost-killed-due-to-full-GC-tp15386p16258.html

Thanks for the heads-up and explaining how you resolve the issue!

Best, Fabian

2017-10-18 3:50 GMT+02:00 ShB <[hidden email]>:
I just wanted to leave an update about this issue, for someone else who might
come across it. The problem was with memory, but it was disk memory and not
heap/off-heap memory. Yarn was killing off my containers as they exceeded
the threshold for disk utilization and this was manifesting as Task manager
was lost/killed or JobClientActorConnectionTimeoutException: Lost connection
to the JobManager. Digging deep into the individual instance node manager
logs provided some hints about it being a disk issue.

Some fixes for this problem:
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
-- can be increased to alleviate the problem temporarily.
Increasing the disk capacity on each task manager is a more long-term fix.
Increasing the number of task managers increases available disk memory and
hence is also a fix.

Thanks!