Flink upgrade causes operator to lose state

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink upgrade causes operator to lose state

soumoks
Hi,

We are upgrading several applications from Flink 1.9.1 to 1.11.2.
Some of the applications written with Table API are not able start from
savepoint after the upgrade and fail with the following error.

Caused by: java.lang.IllegalStateException: Failed to rollback to
checkpoint/savepoint s3://0XXXXX/savepoint-9bd1c7-8cafa2c1a9ac. Cannot map
checkpoint/savepoint state for operator 49bb9e12f4a332535e9b828c1d4e2c0a to
the new program, because the operator is not available in the new program.
If you want to allow to skip this, you can set the --allowNonRestoredState
option on the CLI.
        at
org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:210)
        at
org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:180)
        at



Starting with --allowNonRestoredState option loses multiple state operators
and is not an option.


and a few other apps are failing with the following error.

com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:
2532
Serialization trace:
fieldValueMap (io.caseclass.samplecaseclass)

This case class consists of a Mutable.Map which seems to be causing the
issue.



And finally another app fails with the following error.


Caused by: org.apache.flink.util.FlinkException: Could not restore keyed
state backend for
KeyedProcessOperator_48c7355e6ee5ecb2411313ac3173573d_(1/1) from any of the
1 provided restore options.
        at
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
        at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
        at
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
        ... 9 more

Caused by: java.io.IOException: Could not find class
'org.apache.flink.table.runtime.typeutils.BaseRowSerializer$BaseRowSerializerSnapshot'
in classpath.
        at
org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:721)
        at
org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil.readAndInstantiateSnapshotClass(TypeSerializerSnapshotSerializationUtil.java:84)




From the savepoint compatibility doc[1], restoring state across Flink 1.9.1
and 1.11.2 should be possible but it does not seem to be the case for the
above apps.

[1] -
https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink upgrade causes operator to lose state

Chesnay Schepler
It is currently not possible to upgrade table API / SQL applications via
savepoints.

This thread may provide some more insights:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-SQL-and-checkpoints-and-savepoints-td40749.html

On 3/3/2021 6:53 PM, soumoks wrote:

> Hi,
>
> We are upgrading several applications from Flink 1.9.1 to 1.11.2.
> Some of the applications written with Table API are not able start from
> savepoint after the upgrade and fail with the following error.
>
> Caused by: java.lang.IllegalStateException: Failed to rollback to
> checkpoint/savepoint s3://0XXXXX/savepoint-9bd1c7-8cafa2c1a9ac. Cannot map
> checkpoint/savepoint state for operator 49bb9e12f4a332535e9b828c1d4e2c0a to
> the new program, because the operator is not available in the new program.
> If you want to allow to skip this, you can set the --allowNonRestoredState
> option on the CLI.
> at
> org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:210)
> at
> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:180)
> at
>
>
>
> Starting with --allowNonRestoredState option loses multiple state operators
> and is not an option.
>
>
> and a few other apps are failing with the following error.
>
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:
> 2532
> Serialization trace:
> fieldValueMap (io.caseclass.samplecaseclass)
>
> This case class consists of a Mutable.Map which seems to be causing the
> issue.
>
>
>
> And finally another app fails with the following error.
>
>
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed
> state backend for
> KeyedProcessOperator_48c7355e6ee5ecb2411313ac3173573d_(1/1) from any of the
> 1 provided restore options.
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
> ... 9 more
>
> Caused by: java.io.IOException: Could not find class
> 'org.apache.flink.table.runtime.typeutils.BaseRowSerializer$BaseRowSerializerSnapshot'
> in classpath.
> at
> org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:721)
> at
> org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializationUtil.readAndInstantiateSnapshotClass(TypeSerializerSnapshotSerializationUtil.java:84)
>
>
>
>
> >From the savepoint compatibility doc[1], restoring state across Flink 1.9.1
> and 1.11.2 should be possible but it does not seem to be the case for the
> above apps.
>
> [1] -
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html
>
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply | Threaded
Open this post in threaded view
|

AW: Flink upgrade causes operator to lose state

Jan Oelschlegel
Maybe this strategy with evolving an application without the need for state restore could help you

https://docs.ververica.com/v2.3/user_guide/sql_development/sql_scripts.html#sql-script-changes

-----Ursprüngliche Nachricht-----
Von: Chesnay Schepler <[hidden email]>
Gesendet: Mittwoch, 3. März 2021 21:35
An: soumoks <[hidden email]>; [hidden email]
Betreff: Re: Flink upgrade causes operator to lose state

It is currently not possible to upgrade table API / SQL applications via savepoints.

This thread may provide some more insights:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-SQL-and-checkpoints-and-savepoints-td40749.html

On 3/3/2021 6:53 PM, soumoks wrote:

> Hi,
>
> We are upgrading several applications from Flink 1.9.1 to 1.11.2.
> Some of the applications written with Table API are not able start
> from savepoint after the upgrade and fail with the following error.
>
> Caused by: java.lang.IllegalStateException: Failed to rollback to
> checkpoint/savepoint s3://0XXXXX/savepoint-9bd1c7-8cafa2c1a9ac. Cannot
> map checkpoint/savepoint state for operator
> 49bb9e12f4a332535e9b828c1d4e2c0a to the new program, because the operator is not available in the new program.
> If you want to allow to skip this, you can set the
> --allowNonRestoredState option on the CLI.
> at
> org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:210)
> at
> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:180)
> at
>
>
>
> Starting with --allowNonRestoredState option loses multiple state
> operators and is not an option.
>
>
> and a few other apps are failing with the following error.
>
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:
> 2532
> Serialization trace:
> fieldValueMap (io.caseclass.samplecaseclass)
>
> This case class consists of a Mutable.Map which seems to be causing
> the issue.
>
>
>
> And finally another app fails with the following error.
>
>
> Caused by: org.apache.flink.util.FlinkException: Could not restore
> keyed state backend for
> KeyedProcessOperator_48c7355e6ee5ecb2411313ac3173573d_(1/1) from any
> of the
> 1 provided restore options.
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
> ... 9 more
>
> Caused by: java.io.IOException: Could not find class
> 'org.apache.flink.table.runtime.typeutils.BaseRowSerializer$BaseRowSerializerSnapshot'
> in classpath.
> at
> org.apache.flink.util.InstantiationUtil.resolveClassByName(InstantiationUtil.java:721)
> at
> org.apache.flink.api.common.typeutils.TypeSerializerSnapshotSerializat
> ionUtil.readAndInstantiateSnapshotClass(TypeSerializerSnapshotSerializ
> ationUtil.java:84)
>
>
>
>
> >From the savepoint compatibility doc[1], restoring state across Flink
> >1.9.1
> and 1.11.2 should be possible but it does not seem to be the case for
> the above apps.
>
> [1] -
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.h
> tml
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


HINWEIS: Dies ist eine vertrauliche Nachricht und nur für den Adressaten bestimmt. Es ist nicht erlaubt, diese Nachricht zu kopieren oder Dritten zugänglich zu machen. Sollten Sie diese Nachricht irrtümlich erhalten haben, bitte ich um Ihre Mitteilung per E-Mail oder unter der oben angegebenen Telefonnummer.