Flink 1.8.3 Kubernetes POD OOM

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.8.3 Kubernetes POD OOM

Josson Paul
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson

memory_issue.pdf (1M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.8.3 Kubernetes POD OOM

Fabian Hueske-2
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.8.3 Kubernetes POD OOM

Andrey Zagrebin-5
Hi Josson,

Do you use state backend? is it RocksDB?

Best,
Andrey

On Fri, May 22, 2020 at 12:58 PM Fabian Hueske <[hidden email]> wrote:
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.8.3 Kubernetes POD OOM

Josson Paul
Hi Andrey,
  We don't use Rocks DB. As I said in the original email I am using File Based. Even though our cluster is on Kubernetes out Flink cluster is Flink's stand alone resource manager. We have not yet integrated our Flink with Kubernetes.

Thanks,
Josson

On Fri, May 22, 2020 at 3:37 AM Andrey Zagrebin <[hidden email]> wrote:
Hi Josson,

Do you use state backend? is it RocksDB?

Best,
Andrey

On Fri, May 22, 2020 at 12:58 PM Fabian Hueske <[hidden email]> wrote:
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson


--
Thanks
Josson
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.8.3 Kubernetes POD OOM

Josson Paul
Hi Andrey,
  To clarify the above email. I am using Heap Based State and not Rocks DB. 

Thanks,
Josson

On Sat, May 23, 2020, 17:37 Josson Paul <[hidden email]> wrote:
Hi Andrey,
  We don't use Rocks DB. As I said in the original email I am using File Based. Even though our cluster is on Kubernetes out Flink cluster is Flink's stand alone resource manager. We have not yet integrated our Flink with Kubernetes.

Thanks,
Josson

On Fri, May 22, 2020 at 3:37 AM Andrey Zagrebin <[hidden email]> wrote:
Hi Josson,

Do you use state backend? is it RocksDB?

Best,
Andrey

On Fri, May 22, 2020 at 12:58 PM Fabian Hueske <[hidden email]> wrote:
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson


--
Thanks
Josson
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.8.3 Kubernetes POD OOM

Andrey Zagrebin-4
Hi Josson,

Thanks for the details. Sorry, I overlooked, you indeed mentioned the file backend.

Looking into Flink memory model [1], I do not notice any problems related to the types of memory consumption we model in Flink.
Direct memory consumption by network stack corresponds to your configured fraction (0.02f). JVM heap cannot cause problems.
I do not know any other types of memory consumption in Flink 1.8.

Nonetheless, there is no way to control all types of memory consumption, 
especially native memory allocation either by user code or JVM (if you do not use RocksDB, Flink barely uses the native memory explicitly).
The examples (not exhaustive):
- native libraries in user code or its dependencies which use off-heap, e.g. malloc (detecting this would require some OS process dump)
- JVM metaspace, threads/GC overhead etc (we do not limit any of this in 1.8 by JVM args)

Recently, we discovered some class loading leaks (JVM meatspace), e.g. [2] or [3].
Since 1.10, Flink limits JVM meatspace and direct memory then you would get a concrete OOM exception before container dies.
Maybe Kafka or Elastic search connector clients got updated with 1.8 and caused some leaks.
I cc’ed Gordon and Piotr whether they have an idea.

I suggest to try to decrease POD memory, note the consumed memory of various types at the moment the container dies 
(I suppose as you did), and then increase POD memory multiple times until you see which type of memory consumption always grows till OOM
and other types hopefully stabilise on some level.
Then you could take a dump of that ever growing type of memory consumption to analyse if there is memory leak.

Best,
Andrey

[3] https://issues.apache.org/jira/browse/FLINK-11205

On 24 May 2020, at 06:18, Josson Paul <[hidden email]> wrote:

Hi Andrey,
  To clarify the above email. I am using Heap Based State and not Rocks DB. 

Thanks,
Josson

On Sat, May 23, 2020, 17:37 Josson Paul <[hidden email]> wrote:
Hi Andrey,
  We don't use Rocks DB. As I said in the original email I am using File Based. Even though our cluster is on Kubernetes out Flink cluster is Flink's stand alone resource manager. We have not yet integrated our Flink with Kubernetes.

Thanks,
Josson

On Fri, May 22, 2020 at 3:37 AM Andrey Zagrebin <[hidden email]> wrote:
Hi Josson,

Do you use state backend? is it RocksDB?

Best,
Andrey

On Fri, May 22, 2020 at 12:58 PM Fabian Hueske <[hidden email]> wrote:
Hi Josson,

I don't have much experience setting memory bounds in Kubernetes myself, but my colleague Andrey (in CC) reworked Flink's memory configuration for the last release to ease the configuration in container envs.
He might be able to help.

Best, Fabian

Am Do., 21. Mai 2020 um 18:43 Uhr schrieb Josson Paul <[hidden email]>:
Cluster type: Standalone cluster
Job Type: Streaming
JVM memory: 26.2 GB
POD memory: 33 GB
CPU: 10 Cores
GC: G1GC
Flink Version: 1.8.3
State back end: File based
NETWORK_BUFFERS_MEMORY_FRACTION : 0.02f of the Heap
We are not accessing Direct memory from application. Only Flink uses direct memory

We notice that in Flink 1.8.3 over a period of 30 minutes the POD is killed with OOM. JVM Heap is with in limit.
We read from Kafka and have windows in the application. Our Sink is either Kafka or Elastic Search
The same application/job was working perfectly in Flink 1.4.1 with the same input rate and output rate
No back pressure
I have attached few Grafana charts as PDF
Any idea why the off heap memory / outside JVM memory is going up and eventually reaching the limit.

 Java Heap (reserved=26845184KB, committed=26845184KB)
(mmap: reserved=26845184KB, committed=26845184KB)

- Class (reserved=1241866KB, committed=219686KB)
(classes #36599)
(malloc=4874KB #74568)
(mmap: reserved=1236992KB, committed=214812KB)

- Thread (reserved=394394KB, committed=394394KB)
(thread #383)
(stack: reserved=392696KB, committed=392696KB)
(malloc=1250KB #1920)
(arena=448KB #764)

- Code (reserved=272178KB, committed=137954KB)
(malloc=22578KB #33442)
(mmap: reserved=249600KB, committed=115376KB)

- GC (reserved=1365088KB, committed=1365088KB)
(malloc=336112KB #1130298)
(mmap: reserved=1028976KB, committed=1028976KB)



--
Thanks
Josson


--
Thanks
Josson