Savepoints and checkpoints

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Savepoints and checkpoints

min.tan

Hi,

 

Are Flink savepoints and checkpoitns still vlaid after some data entity changes e.g. Kafka topic name changes? I expect the answer is "No"?

Similarly, are Flink savepoints and checkpoitns still valid after some job graph changes e.g. one stateful operator splits into two? I expect the answer is "No"?

 

Regards,

 

Min

 



E-mails can involve SUBSTANTIAL RISKS, e.g. lack of confidentiality, potential manipulation of contents and/or sender's address, incorrect recipient (misdirection), viruses etc. Based on previous e-mail correspondence with you and/or an agreement reached with you, UBS considers itself authorized to contact you via e-mail. UBS assumes no responsibility for any loss or damage resulting from the use of e-mails.
The recipient is aware of and accepts the inherent risks of using e-mails, in particular the risk that the banking relationship and confidential information relating thereto are disclosed to third parties.
UBS reserves the right to retain and monitor all messages. Messages are protected and accessed only in legally justified cases.
For information on how UBS uses and discloses personal data, how long we retain it, how we keep it secure and your data protection rights, please see our Privacy Notice http://www.ubs.com/privacy-statement
Reply | Threaded
Open this post in threaded view
|

Re: Savepoints and checkpoints

Yun Tang

Hi Min

 

Since kafka consumer would store KafkaTopicPartition [1] within checkpoint, you cannot load previous state if you changed the kafka topic name.

 

If you assign operator-id to previous stateful operator and splits into two operator but still maintain one new operator as previous operator-id, Flink would try to assign previous state to that new operator. Otherwise, previous state would not match any operator and you need to consider allow non-restored state if choose to resume from previous checkpoint/savepoint [3].

 

[1] https://github.com/apache/flink/blob/b290230662fa1aa38909aed40ac85eaf843e1d1c/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L902

[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#assigning-operator-ids

[3] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-happens-if-i-delete-an-operator-that-has-state-from-my-job

 

Best

Yun Tang

 

 

From: "[hidden email]" <[hidden email]>
Date: Thursday, November 21, 2019 at 5:19 PM
To: "[hidden email]" <[hidden email]>
Subject: Savepoints and checkpoints

 

Hi,

 

Are Flink savepoints and checkpoitns still vlaid after some data entity changes e.g. Kafka topic name changes? I expect the answer is "No"?

Similarly, are Flink savepoints and checkpoitns still valid after some job graph changes e.g. one stateful operator splits into two? I expect the answer is "No"?

 

Regards,

 

Min

 

Reply | Threaded
Open this post in threaded view
|

Re: Savepoints and checkpoints

Congxian Qiu
Hi

First, Checkpoint for Flink is a distributed snapshot of the job. 
As Yun said, Kafka consumer will snapshot the topic name and partition to the checkpoint, then when restoring from the last checkpoint you do not know about the newly topic name.
Inner the checkpoint, you can think checkpoint as collections of key-value pair, the key is operatorid and value is the snapshot of the operator, operatorid will be generated automatically if you do not set it, and you can disable the automatically generate by calling `ExecutionConfig#disableAutoGeneratedUIDs`[1], and it will fail the job submission if any operator does not contain a custom unique ID.

Best,
Congxian


Yun Tang <[hidden email]> 于2019年11月22日周五 上午2:20写道:

Hi Min

 

Since kafka consumer would store KafkaTopicPartition [1] within checkpoint, you cannot load previous state if you changed the kafka topic name.

 

If you assign operator-id to previous stateful operator and splits into two operator but still maintain one new operator as previous operator-id, Flink would try to assign previous state to that new operator. Otherwise, previous state would not match any operator and you need to consider allow non-restored state if choose to resume from previous checkpoint/savepoint [3].

 

[1] https://github.com/apache/flink/blob/b290230662fa1aa38909aed40ac85eaf843e1d1c/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L902

[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#assigning-operator-ids

[3] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-happens-if-i-delete-an-operator-that-has-state-from-my-job

 

Best

Yun Tang

 

 

From: "[hidden email]" <[hidden email]>
Date: Thursday, November 21, 2019 at 5:19 PM
To: "[hidden email]" <[hidden email]>
Subject: Savepoints and checkpoints

 

Hi,

 

Are Flink savepoints and checkpoitns still vlaid after some data entity changes e.g. Kafka topic name changes? I expect the answer is "No"?

Similarly, are Flink savepoints and checkpoitns still valid after some job graph changes e.g. one stateful operator splits into two? I expect the answer is "No"?

 

Regards,

 

Min