(DEPRECATED) Apache Flink User Mailing List archive.

Change state backend.

Classic

List

Threaded

6 messages Options

shashank734

Change state backend.

Hi,

Can i change State backend from FsStateBackend to RocksDBStateBackend directly or i have to do some migration ?

Thanks Regards

SHASHANK AGARWAL

--- Trying to mobilize the things....

Biplob Biswas

Re: Change state backend.

Could you clarify a bit more? Do you want an existing state on a running job to be migrated from FsStateBackend to RocksDbStateBackend?

Or

Do you have the option of restarting your job after changing existing code?

Ted Yu

Re: Change state backend.

I guess shashank meant switching state backend w.r.t. savepoints.

On Wed, Aug 16, 2017 at 4:00 AM, Biplob Biswas <[hidden email]> wrote:

Could you clarify a bit more? Do you want an existing state on a running job
to be migrated from FsStateBackend to RocksDbStateBackend?

Or

Do you have the option of restarting your job after changing existing code?

--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14930.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Biplob Biswas

Re: Change state backend.

I am not really sure you can do that out of the box, if not, indeed that should be possible in the near future.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility

There are already plans for state migration (with upgraded serializers) as I read here, so this could be an additional task while migrating states, although I am not sure how easy or difficult this could be.

Also, as you can read here, http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html

Stefan really nicely explained what would/is happening on state migration on different backends.

So based on that, what I can imagine is moving from FsStateBackend to RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend would be easier, but not the other way round.

Thanks

Stefan Richter

Re: Change state backend.

This is not possible out of the box. Historically, the checkpoint/savepoint formats have been different between heap based and RocksDB based backends. We have already eliminated most differences in 1.3.

However, there are two problems remaining. The first problem is just how the number of written key-value pairs is tracked: in the heap case, we have all the counts and can serialize: count + iterate all elements. For RocksDB this is not possible because there is no way to get the exact key count, so the serialization iterates all elements and then writes a terminal symbol to the stream. So what we could consider in the future is the change the heap backends to write in the same way as RocksDB, which is a slightly less natural approach, but much easier than trying to emulate the current heap approach with RocksDB. The later would require us do do an iteration over the whole state just to obtain the counts.

The second problem is about key-groups: we keep all k/v pairs in RocksDB in key-group order by prefixing the key bytes with the key-group id (this is important for rescaling). When we write the elements from RocksDB to disk, they include the prefix. For the heap backend, this prefix is not required because (as we are in memory) we can very efficiently do the key-group partitioning in the async part of a checkpoint/savepoint. So here have two options, both with some pros and cons. Either we don’t write the key-group bytes with RocksDB and recompute them on restore (this means we have to go through serde and compute a hash) or we add the key-group to the heap format (1-2 bytes extra per key).

So, it is definitely possible to unify both formats completely, but those two points need to be resolved.

Hope this gives some more details to the discussion.

Best,
Stefan

> Am 17.08.2017 um 10:50 schrieb Biplob Biswas <[hidden email]>:
>
> I am not really sure you can do that out of the box, if not, indeed that
> should be possible in the near future.
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility
>
> There are already plans for state migration (with upgraded serializers) as I
> read here, so this could be an additional task while migrating states,
> although I am not sure how easy or difficult this could be.
>
> Also, as you can read here,
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html
>
> Stefan really nicely explained what would/is happening on state migration on
> different backends.
>
> So based on that, what I can imagine is moving from FsStateBackend to
> RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend would
> be easier, but not the other way round.
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14961.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Ted Yu

Re: Change state backend.

bq. we add the key-group to the heap format (1-2 bytes extra per key).

This seems to be better choice among the two.

bq. change the heap backends to write in the same way as RocksDB

+1 on above.

These two combined would give users flexibility in state backend migration.

On Thu, Aug 17, 2017 at 2:55 AM, Stefan Richter <[hidden email]> wrote:

This is not possible out of the box. Historically, the checkpoint/savepoint formats have been different between heap based and RocksDB based backends. We have already eliminated most differences in 1.3.

However, there are two problems remaining. The first problem is just how the number of written key-value pairs is tracked: in the heap case, we have all the counts and can serialize: count + iterate all elements. For RocksDB this is not possible because there is no way to get the exact key count, so the serialization iterates all elements and then writes a terminal symbol to the stream. So what we could consider in the future is the change the heap backends to write in the same way as RocksDB, which is a slightly less natural approach, but much easier than trying to emulate the current heap approach with RocksDB. The later would require us do do an iteration over the whole state just to obtain the counts.

The second problem is about key-groups: we keep all k/v pairs in RocksDB in key-group order by prefixing the key bytes with the key-group id (this is important for rescaling). When we write the elements from RocksDB to disk, they include the prefix. For the heap backend, this prefix is not required because (as we are in memory) we can very efficiently do the key-group partitioning in the async part of a checkpoint/savepoint. So here have two options, both with some pros and cons. Either we don’t write the key-group bytes with RocksDB and recompute them on restore (this means we have to go through serde and compute a hash) or we add the key-group to the heap format (1-2 bytes extra per key).

So, it is definitely possible to unify both formats completely, but those two points need to be resolved.

Hope this gives some more details to the discussion.

Best,
Stefan

> Am 17.08.2017 um 10:50 schrieb Biplob Biswas <[hidden email]>:
>
> I am not really sure you can do that out of the box, if not, indeed that
> should be possible in the near future.
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility
>
> There are already plans for state migration (with upgraded serializers) as I
> read here, so this could be an additional task while migrating states,
> although I am not sure how easy or difficult this could be.
>
> Also, as you can read here,
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Evolving-serializers-and-impact-on-flink-managed-states-td14777.html
>
> Stefan really nicely explained what would/is happening on state migration on
> different backends.
>
> So based on that, what I can imagine is moving from FsStateBackend to
> RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend would
> be easier, but not the other way round.
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Change-state-backend-tp14928p14961.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.