Re: Streaming - memory management

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Streaming - memory management

Vinay Patil
Hi Stephan,

Just wanted to jump into this discussion regarding state.

So do you mean that if we maintain user-defined state (for non-window operators), then if we do  not clear it explicitly will the data for that key remains in RocksDB.

What happens in case of checkpoint ? I read in the documentation that after the checkpoint happens the rocksDB data is pushed to the desired location (hdfs or s3 or other fs), so for user-defined state does the data still remain in RocksDB after checkpoint ?

Correct me if I have misunderstood this concept

For one of our use we were going for this, but since I read the above part in documentation so we are going for Cassandra now (to store records and query them for a special case)





Regards,
Vinay Patil

On Wed, Aug 31, 2016 at 4:51 AM, Stephan Ewen <[hidden email]> wrote:
In streaming, memory is mainly needed for state (key/value state). The
exact representation depends on the chosen StateBackend.

State is explicitly released: For windows, state is cleaned up
automatically (firing / expiry), for user-defined state, keys have to be
explicitly cleared (clear() method) or in the future will have the option
to expire.

The heavy work horse for streaming state is currently RocksDB, which
internally uses native (off-heap) memory to keep the data.

Does that help?

Stephan


On Tue, Aug 30, 2016 at 11:52 PM, Roshan Naik <[hidden email]>
wrote:

> As per the docs, in Batch mode, dynamic memory allocation is avoided by
> storing messages being processed in ByteBuffers via Unsafe methods.
>
> Couldn't find any docs  describing mem mgmt in Streamingn mode. So...
>
> - Am wondering if this is also the case with Streaming ?
>
> - If so, how does Flink detect that an object is no longer being used and
> can be reclaimed for reuse once again ?
>
> -roshan
>