After a batch job finishes in a flink standalone cluster, I notice that the memory isn't freed up. I understand Flink uses it's own memory manager and just allocates a large tenured byte array that is not GC'ed. But does the memory used in this byte array get released when the batch job is done?
The scenario I am facing is that I am running a series of scheduled batch jobs on a standalone cluster with 1 TM and 1 Slot. I notice that after a job is complete the memory used in the TM isn't freed up. I can confirm this by running jmap dump. Has anyone else run into this issue? This is on 1.9. Thanks Tim |
Hi Tim, Do you mean the user heap memory used by the tasks of finished jobs is not freed up? If this is the case, the memory usage of taskmanger will increase as more and more jobs finished. However this does not happen, the memory will be freed up by jvm gc. BTW, flink has its own memory management strategy, including task heap/off-heap, framework heap, jvm overhead and so on. Why do you care about when the memory is freed? I think it will be done automatically by flink and jvm. Best, Yang Timothy Victor <[hidden email]> 于2019年10月10日周四 下午7:55写道:
|
In reply to this post by Timothy Victor
I think it depends on your configurations. - Are you using on-heap/off-heap managed memory? (configured by 'taskmanager.memory.off-heap', by default is false) - Is managed memory pre-allocated? (configured by 'taskmanager.memory.preallocate', by default is ffalse) If managed memory is pre-allocated, then the allocated memory segments will never be released. If it's not pre-allocated, memory segments should be released when the task is finished, but the actual memory will not be de-allocated until next GC. Since the job is finished, there may not be enough heap activities to trigger the GC. If on-heap memory is used, you may not be able to observe the decreasing of TM memory usage, because JVM heap size does not scale down. Only if off-heap memory is used, you might be able to observe the decreasing of TM memory usage after a GC, but not from a jmap dump because jmap dumps heap memory usage only. Besides, I don't think you need to worry about whether memory is released after one job is finished. Sometimes flink/jvm do not release memory after jobs/tasks finished, so that it can be reused directly by other jobs/tasks, for the purpose of reducing allocate/deallocated overheads and optimizing performance. Thank you~ Xintong Song On Thu, Oct 10, 2019 at 7:55 PM Timothy Victor <[hidden email]> wrote:
|
Thanks Xintong! In my case both of those parameters are set to false (default). I think I am sort of following what's happening here. I have one TM with heap size set to 1GB. When the cluster is started the TM doesn't use that 1GB (no allocations). Once the first batch job is submitted I can see the memory roughly go up by 1GB. I presume this is when TM allocates its 1GB on the heap, and if I read correctly this is essentially a large byte buffer that is tenured so that it is never GCed. Flink writes any pojos (serializes) to this byte buffer and this is to essentially circumvent GC for performance. Once the job is done, this byte buffer remains on the heap, and the task manager keeps it to use for the next batch job. This is why I never see the memory go down after a batch job is complete. Does this make sense? Please let me know what you think. Thanks Tim On Thu, Oct 10, 2019, 11:16 PM Xintong Song <[hidden email]> wrote:
|
This part about the GC not cleaning up after the job finishes makes sense. However, I o served that even after I run a "jcmd <pid> GC.run" on the task manager process ID the memory is still not released. This is what concerns me. Tim On Sat, Oct 12, 2019, 2:53 AM Xintong Song <[hidden email]> wrote:
|
Forced GC does not mean that JVM will even try to release the freed memory back to the operating system. This highly depends on the JVM and garbage collector used for your Flink setup, but most probably it's the jvm8 with the ParallelGC collector. ParallelGC is known to be not that aggressive on releasing free heap memory back to OS. I see here multiple different solutions: 1. Question yourself why do you really need to release any memory back? Is there a logical reason behind it? As next time you resubmit the job, the memory is going to be reused. 2. You can switch to G1GC and use JVM args like "-XX:MaxHeapFreeRatio -XX:MinHeapFreeRatio" to make it more aggressive on releasing memory. 3. You can use unofficial JVM builds from RedHat with ShenandoahGC backport, which is also able to do the job: https://builds.shipilev.net/openjdk-shenandoah-jdk8/ 3. Flink 1.10 (hopefully) will be able to run on jvm11, so G1 on it is much more aggressive on releasing memory: https://bugs.openjdk.java.net/browse/JDK-8146436 Roman Grebennikov | [hidden email] On Sat, Oct 12, 2019, at 08:38, Timothy Victor wrote:
|
Thanks for the insight Roman, and also for the GC tips. There are 2 reasons why I wanted to see this memory released. First as a way to just confirm my understanding of Flink memory segment handling. Second is that I run a single standalone cluster that runs both streaming and batch jobs, and thus cluster was being killed by OoM killer (i.e. java runtime was killed, not jvm exception). For the second part, I did some napkin calculations and tuned down the number of TMs on the host. Thus seems to help a but since before what was happening was subsequent batch jobs were being scheduled on fresh TMs which had not allocated memory before. So as more TMs did work more memory was used but never released and subsequently the OS oomkiller stepped in. My direction now (thanks to all I learned and the input in this thread) is to a) Not run Streaming and Batch jobs on the same cluster. Their memory models are different enough that this is not a good thing and I certainly don't want a streaming job to be impacted due to the running of a batch job. b) Move the batch jobs to a Job Cluster setup running in K8s. I have had a lot of trouble getting this to run stability due to K8s issues, but I am very close now I think. Thanks again Tim On Mon, Oct 14, 2019, 3:08 AM Roman Grebennikov <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |