Cannot restore from savepoint after adding a sink operator

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cannot restore from savepoint after adding a sink operator

Sam Huang
This post was updated on .
Hi all! I added a S3 bucketing sink operator to my flink job and tried to start it from a savepoint using --allowNonRestoreState option, and it's showing me this error:


I found on Flink official document saying:

So I suppose the restoration should succeed even if the operator number doesn't match. Can someone explain to me why this happens, and what's the possible solution?

FYI, some specs:
Flink version: 1.2.1
Job parallelism: 10
S3 sink parallelism: 1
Job execution graph:



Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Cannot restore from savepoint after adding a sink operator

Stefan Richter
Hi,

in Flink 1.2.x the restore will not succeed because it was mapping states on a task level, not at the operator level. This makes it impossible to add stateful operators somewhere to an operator chain, because Flink could not figure out which state belongs to which operator after such a modification.

However, we have changed this starting from Flink 1.3, which tracks maps state to operators. If you have to do this, I think the way to go is upgrading your job from the savepoint to 1.3 and after that add the sink operator. Please also make sure that you have uuids assigned to your operators as described in the documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/production_ready.html .

Best,
Stefan

Am 04.08.2017 um 00:44 schrieb Sam Huang <[hidden email]>:

Hi all! I added a S3 bucketing sink operator to my flink job and tried to
start it from a savepoint using --allowNonRestoreState option, and it's
showing me this error:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n14674/flink_exception.png>

I found on Flink official document saying:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n14674/flink_state.png>
So I suppose the restoration should succeed even if the operator number
doesn't match. Can someone explain to me why this happens, and what's the
possible solution?

FYI, some specs:
Flink version: 1.2.1
Job parallelism: 10
S3 sink parallelism: 1
Job execution graph:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n14674/s3_sink_graph.png>


Thanks




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cannot-restore-from-savepoint-after-adding-a-sink-operator-tp14674.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.