Re: Flink 1.8.3 Kubernetes POD OOM

Posted by Andrey Zagrebin-5 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-1-8-3-Kubernetes-POD-OOM-tp35324p35357.html

Hi Josson,

Do you use state backend? is it RocksDB?

Best,
Andrey

On Fri, May 22, 2020 at 12:58 PM Fabian Hueske <[hidden email]> wrote:
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson