Flink CheckPoint/Savepoint Behavior Question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink CheckPoint/Savepoint Behavior Question

Jason Liu
We currently have some logic to load data from S3 into memory in our Flink/Kinesis Analytics app. This happens before the RichFunction.open() function.

We have two questions here and I can't find too much information in the apache.org website:

  1. (More of a clarification) When Flink does checkpointing/savepointing only the state and the stream positions are saved right? The memory content won't be saved and restored.

  2. When Flink restores from checkpoint/savepoint, does it still go through the application initialization phase? Basically will the code before the RichFunction' open() be run? If not, would the operators.open() functions run, when Flink restore from checkpoint/savepoint?

Thanks,
Jason
Reply | Threaded
Open this post in threaded view
|

Re: Flink CheckPoint/Savepoint Behavior Question

raghav280392
Flink is aware of all the tasks running in the cluster. If any of the tasks fails, the failed task is restored using the checkpoint (only If the task uses Flink Operator State). This scenario will not use savepoints. Savepoints are same as checkpoints and the difference is that the savepoints are created manually or when we manually cancel/stop a job. We can then start the same job again by pointing to the savepoint. If we start a job without a savepoint, the job will start with an empty operator state.

Correct me If I am wrong.

Other references:

Thank you




Virus-free. www.avast.com

On Tue, Feb 2, 2021 at 4:07 AM Jason Liu <[hidden email]> wrote:
We currently have some logic to load data from S3 into memory in our Flink/Kinesis Analytics app. This happens before the RichFunction.open() function.

We have two questions here and I can't find too much information in the apache.org website:

  1. (More of a clarification) When Flink does checkpointing/savepointing only the state and the stream positions are saved right? The memory content won't be saved and restored.

  2. When Flink restores from checkpoint/savepoint, does it still go through the application initialization phase? Basically will the code before the RichFunction' open() be run? If not, would the operators.open() functions run, when Flink restore from checkpoint/savepoint?

Thanks,
Jason


--
Raghavendar T S

Virus-free. www.avast.com
Reply | Threaded
Open this post in threaded view
|

Re: Flink CheckPoint/Savepoint Behavior Question

Arvid Heise-4
Hi Jason,

you got it perfectly right. So everything that is not in an explicit state (or checkpointed in CheckpointedFunction#snapshotState) is lost on recovery. However, Flink applications always go through the complete life-cycle.

Note that I'd look into CheckpointedFunction if the side-information that you fetch from S3 is not changing and rather small.

Best,

Arvid

On Tue, Feb 2, 2021 at 5:42 AM Raghavendar T S <[hidden email]> wrote:
Flink is aware of all the tasks running in the cluster. If any of the tasks fails, the failed task is restored using the checkpoint (only If the task uses Flink Operator State). This scenario will not use savepoints. Savepoints are same as checkpoints and the difference is that the savepoints are created manually or when we manually cancel/stop a job. We can then start the same job again by pointing to the savepoint. If we start a job without a savepoint, the job will start with an empty operator state.

Correct me If I am wrong.

Other references:

Thank you




Virus-free. www.avast.com

On Tue, Feb 2, 2021 at 4:07 AM Jason Liu <[hidden email]> wrote:
We currently have some logic to load data from S3 into memory in our Flink/Kinesis Analytics app. This happens before the RichFunction.open() function.

We have two questions here and I can't find too much information in the apache.org website:

  1. (More of a clarification) When Flink does checkpointing/savepointing only the state and the stream positions are saved right? The memory content won't be saved and restored.

  2. When Flink restores from checkpoint/savepoint, does it still go through the application initialization phase? Basically will the code before the RichFunction' open() be run? If not, would the operators.open() functions run, when Flink restore from checkpoint/savepoint?

Thanks,
Jason


--
Raghavendar T S

Virus-free. www.avast.com