Flink can't initialize operator state backend when starting from checkpoint

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink can't initialize operator state backend when starting from checkpoint

Marvin777
Hi all,

When Flink(1.4.2) job starts, it could find checkpoint files at HDFS, but exception occurs during deserializing:

image.png

Do you have any insight on this?

Thanks,
Qingxiang Ma
Reply | Threaded
Open this post in threaded view
|

Re: Flink can't initialize operator state backend when starting from checkpoint

vino yang
Hi Qingxiang,

Several days ago, Stefan described the causes of this anomaly in a problem similar to this:
Typically, these problems have been observed when something was wrong with a serializer or a stateful serializer was used from multiple threads.

Thanks, vino.

Marvin777 <[hidden email]> 于2018年9月21日周五 下午3:20写道:
Hi all,

When Flink(1.4.2) job starts, it could find checkpoint files at HDFS, but exception occurs during deserializing:

image.png

Do you have any insight on this?

Thanks,
Qingxiang Ma
Reply | Threaded
Open this post in threaded view
|

Re: Flink can't initialize operator state backend when starting from checkpoint

Stefan Richter
Hi,

that is correct. If you are using custom serializers you should double check their correctness, maybe using our test base for type serializers. Another reason could be that you modified the job in a way that silently changed the schema somehow. Concurrent use of serializers across different threads can also cause problems like this and I think there was a bug in 1.4 around this topic. I suggest that you also update to a newer version, at least the latest bugfix release.

Best,
Stefan

Am 21.09.2018 um 10:26 schrieb vino yang <[hidden email]>:

Hi Qingxiang,

Several days ago, Stefan described the causes of this anomaly in a problem similar to this:
Typically, these problems have been observed when something was wrong with a serializer or a stateful serializer was used from multiple threads.

Thanks, vino.

Marvin777 <[hidden email]> 于2018年9月21日周五 下午3:20写道:
Hi all,

When Flink(1.4.2) job starts, it could find checkpoint files at HDFS, but exception occurs during deserializing:

<image.png>

Do you have any insight on this?

Thanks,
Qingxiang Ma

Reply | Threaded
Open this post in threaded view
|

Re: Flink can't initialize operator state backend when starting from checkpoint

Marvin777
Hi,

I do not use custom serializers,  and my job contains only source and sink(BucketingSink).  What causes this phenomenon in general?

I suggest that you also update to a newer version, at least the latest bugfix release 

Which version does this sentence refer to?  And could you please help list the issue about this topic?

Thanks a lot.



Stefan Richter <[hidden email]> 于2018年9月21日周五 下午4:48写道:
Hi,

that is correct. If you are using custom serializers you should double check their correctness, maybe using our test base for type serializers. Another reason could be that you modified the job in a way that silently changed the schema somehow. Concurrent use of serializers across different threads can also cause problems like this and I think there was a bug in 1.4 around this topic. I suggest that you also update to a newer version, at least the latest bugfix release.

Best,
Stefan

Am 21.09.2018 um 10:26 schrieb vino yang <[hidden email]>:

Hi Qingxiang,

Several days ago, Stefan described the causes of this anomaly in a problem similar to this:
Typically, these problems have been observed when something was wrong with a serializer or a stateful serializer was used from multiple threads.

Thanks, vino.

Marvin777 <[hidden email]> 于2018年9月21日周五 下午3:20写道:
Hi all,

When Flink(1.4.2) job starts, it could find checkpoint files at HDFS, but exception occurs during deserializing:

<image.png>

Do you have any insight on this?

Thanks,
Qingxiang Ma