State management and heap usage

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

State management and heap usage

Michael Latta
I am pretty new to flink and have an initial streaming job working both locally and remotely.  But, both ways if the data volume is too high it runs out of heap.  I am using RichMapFunction to process multiple streams of data.  I assumed Flink would manage keeping state in ram when possible, and spill to RocksDB when it exceeded heap.

Is this correct?  If so are there configs I need to set to enable or tune this so it can run within a fixed memory size?

Michael
Reply | Threaded
Open this post in threaded view
|

Re: State management and heap usage

Gary Yao-2
Hi Michael,

You can configure the default state backend by setting state.backend in
flink-conf.yaml, or you can configure it per job [1]. The default state backend
is "jobmanager" (MemoryStateBackend), which stores state and checkpoints on the
Java heap. RocksDB must be explicitly enabled, e.g., by setting state.backend to
"rocksdb".

Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#configuring-a-state-backend

On Wed, Apr 11, 2018 at 11:04 PM, TechnoMage <[hidden email]> wrote:
I am pretty new to flink and have an initial streaming job working both locally and remotely.  But, both ways if the data volume is too high it runs out of heap.  I am using RichMapFunction to process multiple streams of data.  I assumed Flink would manage keeping state in ram when possible, and spill to RocksDB when it exceeded heap.

Is this correct?  If so are there configs I need to set to enable or tune this so it can run within a fixed memory size?

Michael

Reply | Threaded
Open this post in threaded view
|

Re: State management and heap usage

Michael Latta
Thank you.

Michael

On Apr 12, 2018, at 2:45 AM, Gary Yao <[hidden email]> wrote:

Hi Michael,

You can configure the default state backend by setting state.backend in
flink-conf.yaml, or you can configure it per job [1]. The default state backend
is "jobmanager" (MemoryStateBackend), which stores state and checkpoints on the
Java heap. RocksDB must be explicitly enabled, e.g., by setting state.backend to
"rocksdb".

Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#configuring-a-state-backend

On Wed, Apr 11, 2018 at 11:04 PM, TechnoMage <[hidden email]> wrote:
I am pretty new to flink and have an initial streaming job working both locally and remotely.  But, both ways if the data volume is too high it runs out of heap.  I am using RichMapFunction to process multiple streams of data.  I assumed Flink would manage keeping state in ram when possible, and spill to RocksDB when it exceeded heap.

Is this correct?  If so are there configs I need to set to enable or tune this so it can run within a fixed memory size?

Michael