Flink FsStatebackend is giving better performance than RocksDB

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink FsStatebackend is giving better performance than RocksDB

Vijay Bhaskar
Hi
While doing scale testing we observed that FSStatebackend is out performing RocksDB.
When using RocksDB, off heap  memory keeps growing over a period of time and after a day pod got terminated with OOM.

Whereas the same data pattern FSStatebackend is running for days without any memory spike and OOM.

Based on documentation this is what i understood:

When the state size is so huge that we can't keep it in memory, that case RocksDB is preferred. That means indirectly when FSStatebackend is performing poorly when state size grows, RocksDB is preferred, right?

Another question, We have run production using RocksDB for quite some time, if we switch to FSStatebackend, then what are the consequences?
Following is what i can think of:
Very first time i'll lose the state
Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know there is no incremental checkpoint, but its a performance purpose)

Other than that, do I see any issues?

Regards
Bhaskar

Reply | Threaded
Open this post in threaded view
|

Re: Flink FsStatebackend is giving better performance than RocksDB

David Anderson-3
You should be able to tune your setup to avoid the OOM problem you have run into with RocksDB. It will grow to use all of the memory available to it, but shouldn't leak. Perhaps something is misconfigured.

As for performance, with the FSStateBackend you can expect:

* much better throughput and average latency
* possibly worse worst-case latency, due to GC pauses

Large, multi-slot TMs can be more of a problem with the FSStateBackend because of the increased scope for GC. You may want to run with more TMs, each with fewer slots.

You'll also be giving up the possibility of taking advantage of quick, local recovery [1] that comes for free when using incremental checkpoints with RocksDB. Local recovery can still be used with the FSStateBackend, but it comes at more of a cost.

As for migrating your state, you may be able to use the State Processor API [2] to rewrite a savepoint taken from RocksDB so that it can be loaded by the FSStateBackend, so long as you aren't using any windows or ListCheckpointed state. 


Best,
David

On Fri, Jul 17, 2020 at 1:53 PM Vijay Bhaskar <[hidden email]> wrote:
Hi
While doing scale testing we observed that FSStatebackend is out performing RocksDB.
When using RocksDB, off heap  memory keeps growing over a period of time and after a day pod got terminated with OOM.

Whereas the same data pattern FSStatebackend is running for days without any memory spike and OOM.

Based on documentation this is what i understood:

When the state size is so huge that we can't keep it in memory, that case RocksDB is preferred. That means indirectly when FSStatebackend is performing poorly when state size grows, RocksDB is preferred, right?

Another question, We have run production using RocksDB for quite some time, if we switch to FSStatebackend, then what are the consequences?
Following is what i can think of:
Very first time i'll lose the state
Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know there is no incremental checkpoint, but its a performance purpose)

Other than that, do I see any issues?

Regards
Bhaskar

Reply | Threaded
Open this post in threaded view
|

Re: Flink FsStatebackend is giving better performance than RocksDB

Yun Tang
Hi Vijay,

I think David has provided enough background knowledge for the difference of these two state backends.
When talking about the OOM problem of RocksDB, from Flink-1.10, rocksDB would be bounded to managed memory. However, RocksDB would use the managed memory as much as possible and the rest part for other native memory would be limited to the JVM overhead space[1] and consider to increase the memory of that part.



Best,
Yun Tang

From: David Anderson <[hidden email]>
Sent: Saturday, July 18, 2020 19:34
To: Vijay Bhaskar <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Flink FsStatebackend is giving better performance than RocksDB
 
You should be able to tune your setup to avoid the OOM problem you have run into with RocksDB. It will grow to use all of the memory available to it, but shouldn't leak. Perhaps something is misconfigured.

As for performance, with the FSStateBackend you can expect:

* much better throughput and average latency
* possibly worse worst-case latency, due to GC pauses

Large, multi-slot TMs can be more of a problem with the FSStateBackend because of the increased scope for GC. You may want to run with more TMs, each with fewer slots.

You'll also be giving up the possibility of taking advantage of quick, local recovery [1] that comes for free when using incremental checkpoints with RocksDB. Local recovery can still be used with the FSStateBackend, but it comes at more of a cost.

As for migrating your state, you may be able to use the State Processor API [2] to rewrite a savepoint taken from RocksDB so that it can be loaded by the FSStateBackend, so long as you aren't using any windows or ListCheckpointed state. 


Best,
David

On Fri, Jul 17, 2020 at 1:53 PM Vijay Bhaskar <[hidden email]> wrote:
Hi
While doing scale testing we observed that FSStatebackend is out performing RocksDB.
When using RocksDB, off heap  memory keeps growing over a period of time and after a day pod got terminated with OOM.

Whereas the same data pattern FSStatebackend is running for days without any memory spike and OOM.

Based on documentation this is what i understood:

When the state size is so huge that we can't keep it in memory, that case RocksDB is preferred. That means indirectly when FSStatebackend is performing poorly when state size grows, RocksDB is preferred, right?

Another question, We have run production using RocksDB for quite some time, if we switch to FSStatebackend, then what are the consequences?
Following is what i can think of:
Very first time i'll lose the state
Thereafter w.r.t Save Points and Checkpoints the behavior is same ( I know there is no incremental checkpoint, but its a performance purpose)

Other than that, do I see any issues?

Regards
Bhaskar