Rocksdb to filesystem state migration errors

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Rocksdb to filesystem state migration errors

Lakshmi Gururaja Rao
Hey all,

I'm trying to do a state migration from rocksdb --> filesystem backend. The approach I'm taking here is:
1) Cancel job with savepoint while its running on rocksdb
2) Update the job/cluster with filesystem as the state backend
3) Submit a job with the previous rocksdb savepoint

From what I understand about savepoints, this should work out of the box? However, it works in some cases but fails in others. Specifically, whenever there's a job with user managed state, for e.g., a Process Function with a ValueState, it throws the following error:

Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:418)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:315)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:95)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)


The error specifically comes from a precondition check in HeapKeyedStateBackend . On doing some debugging, I find that the value of writtenKeyGroupIndex always evaluates to 0, thus failing the check.

Has anyone run into this issue before?

Thanks
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Rocksdb to filesystem state migration errors

Congxian Qiu

Hi Lakshmi

Currently, we can’t switch between rocksdb and filesystem backend using savepoint, there is an issue to fix this[1].


[1] https://issues.apache.org/jira/browse/FLINK-11254


Best,
Congxian


Lakshmi Gururaja Rao <[hidden email]> 于2019年3月15日周五 上午8:07写道:
Hey all,

I'm trying to do a state migration from rocksdb --> filesystem backend. The approach I'm taking here is:
1) Cancel job with savepoint while its running on rocksdb
2) Update the job/cluster with filesystem as the state backend
3) Submit a job with the previous rocksdb savepoint

From what I understand about savepoints, this should work out of the box? However, it works in some cases but fails in others. Specifically, whenever there's a job with user managed state, for e.g., a Process Function with a ValueState, it throws the following error:

Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:418)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:315)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:95)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)


The error specifically comes from a precondition check in HeapKeyedStateBackend . On doing some debugging, I find that the value of writtenKeyGroupIndex always evaluates to 0, thus failing the check.

Has anyone run into this issue before?

Thanks
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Rocksdb to filesystem state migration errors

Lakshmi Gururaja Rao
Thanks for pointing me to the JIRA, Congxian. 

On Thu, Mar 14, 2019 at 6:14 PM Congxian Qiu <[hidden email]> wrote:

Hi Lakshmi

Currently, we can’t switch between rocksdb and filesystem backend using savepoint, there is an issue to fix this[1].


[1] https://issues.apache.org/jira/browse/FLINK-11254


Best,
Congxian


Lakshmi Gururaja Rao <[hidden email]> 于2019年3月15日周五 上午8:07写道:
Hey all,

I'm trying to do a state migration from rocksdb --> filesystem backend. The approach I'm taking here is:
1) Cancel job with savepoint while its running on rocksdb
2) Update the job/cluster with filesystem as the state backend
3) Submit a job with the previous rocksdb savepoint

From what I understand about savepoints, this should work out of the box? However, it works in some cases but fails in others. Specifically, whenever there's a job with user managed state, for e.g., a Process Function with a ValueState, it throws the following error:

Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:418)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:315)
	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:95)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)


The error specifically comes from a precondition check in HeapKeyedStateBackend . On doing some debugging, I find that the value of writtenKeyGroupIndex always evaluates to 0, thus failing the check.

Has anyone run into this issue before?

Thanks
Lakshmi


--
Lakshmi Gururaja Rao
SWE
<a href="tel:+12177787218" style="color:rgb(73,79,80);font-family:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:13px" target="_blank">217.778.7218
Lyft