(DEPRECATED) Apache Flink User Mailing List archive.

Flink 1.8.3 Kubernetes POD OOM

Posted by Josson Paul on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-8-3-Kubernetes-POD-OOM-tp35324.html

Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)

Thanks
Josson

memory_issue.pdf (1M) Download Attachment