http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-8-3-Kubernetes-POD-OOM-tp35324.html
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory
We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rateNo back pressure
I have attached few Grafana charts as PDFAny idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.
Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)
- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)
- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)
- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)
- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)