Memory Management in Streaming?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory Management in Streaming?

Shaosu Liu
Hi,

I have had issues when I processed large amount of data (large windows where I could not do incremental updates), flink slowed down significantly. It did help when I increased the amount of memory and used off heap allocation. But it only delayed the onset of the probelm without solving it. 

Could some one give me some hints on how Flink manage window buffer and how streaming manages its memory. I see this page on batch api memory management and wonder what is the equivalent for streaming?
Reply | Threaded
Open this post in threaded view
|

Re: Memory Management in Streaming?

Stefan Richter
Hi,

the memory management described in this wiki page only applies to the batch api. The streaming api currently uses the Java heap, but we are strongly considering introducing managed memory for streaming as well.

Best,
Stefan

Am 02.09.2016 um 22:45 schrieb Shaosu Liu <[hidden email]>:

Hi,

I have had issues when I processed large amount of data (large windows where I could not do incremental updates), flink slowed down significantly. It did help when I increased the amount of memory and used off heap allocation. But it only delayed the onset of the probelm without solving it. 

Could some one give me some hints on how Flink manage window buffer and how streaming manages its memory. I see this page on batch api memory management and wonder what is the equivalent for streaming?

Reply | Threaded
Open this post in threaded view
|

Re: Memory Management in Streaming?

Jamie Grier
In reply to this post by Shaosu Liu
Hi Shaosu,

Do you have an estimate on the total size of state you are keeping for the windows?  How many messages/sec, how large a window, message size, etc would be good details to include.

Also, which state backend are you using?  Have you considered using the RocksDB state backend.  This backend will spill Flink state to disk if it's larger than available RAM.  You'll also probably want to use "fully async" mode for the RocksDBStateBackend.

-Jamie


On Fri, Sep 2, 2016 at 1:45 PM, Shaosu Liu <[hidden email]> wrote:
Hi,

I have had issues when I processed large amount of data (large windows where I could not do incremental updates), flink slowed down significantly. It did help when I increased the amount of memory and used off heap allocation. But it only delayed the onset of the probelm without solving it. 

Could some one give me some hints on how Flink manage window buffer and how streaming manages its memory. I see this page on batch api memory management and wonder what is the equivalent for streaming?



--

Jamie Grier
data Artisans, Director of Applications Engineering