Implementing CheckpointableInputFormat

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Implementing CheckpointableInputFormat

Lu Niu
Hi, Team

I am implementing a custom InputFormat. Shall I implement CheckpointableInputFormat interface? If I don't, does that mean the whole job has to restart given only one task fails? I ask because I found all InputFormat implements CheckpointableInputFormat, which makes me confused. Thank you! 

Best
Lu
Reply | Threaded
Open this post in threaded view
|

Re: Implementing CheckpointableInputFormat

Fabian Hueske-2
Hi,

CheckpointableInputFormat is only relevant if you plan to use the InputFormat in a MonitoringFileSource, i.e., in a streaming application.
If you plan to use it in a DataSet (batch) program, InputFormat is fine.

Btw. the latest release Flink 1.9.0 has major improvements for the recovery of batch jobs.

Best, Fabian

Am Do., 5. Sept. 2019 um 19:01 Uhr schrieb Lu Niu <[hidden email]>:
Hi, Team

I am implementing a custom InputFormat. Shall I implement CheckpointableInputFormat interface? If I don't, does that mean the whole job has to restart given only one task fails? I ask because I found all InputFormat implements CheckpointableInputFormat, which makes me confused. Thank you! 

Best
Lu
Reply | Threaded
Open this post in threaded view
|

Re: Implementing CheckpointableInputFormat

Lu Niu
Hi, Fabian

Thanks for replying! 

I implemented a Custom RichInputFormat implementing CheckpointableInputFormat. And I found it is executed through InputFormatSourceFunction, which doesn't use CheckpointableInputFormat during execution. If so, how does checkpoint work here? 

I also notice when one task finished, I cannot trigger savepoint anymore. It throws exception "Not all tasks are running". Does that imply no savepoint/checkpoint can be taken once any task finish? 

Best
Lu

On Fri, Sep 6, 2019 at 6:33 AM Fabian Hueske <[hidden email]> wrote:
Hi,

CheckpointableInputFormat is only relevant if you plan to use the InputFormat in a MonitoringFileSource, i.e., in a streaming application.
If you plan to use it in a DataSet (batch) program, InputFormat is fine.

Btw. the latest release Flink 1.9.0 has major improvements for the recovery of batch jobs.

Best, Fabian

Am Do., 5. Sept. 2019 um 19:01 Uhr schrieb Lu Niu <[hidden email]>:
Hi, Team

I am implementing a custom InputFormat. Shall I implement CheckpointableInputFormat interface? If I don't, does that mean the whole job has to restart given only one task fails? I ask because I found all InputFormat implements CheckpointableInputFormat, which makes me confused. Thank you! 

Best
Lu
Reply | Threaded
Open this post in threaded view
|

Re: Implementing CheckpointableInputFormat

Chesnay Schepler
You have to use StreamExecutionEnvironment#createFileInput for implementing CheckpointableInputFormat to have any effect. This internally results in it being used by the MonitoringFileSource.
If you use StreamExecutionEnvironment#createInput nothing will be checkpointed for the source; and yes this usually means having to restart the entire job if an error occurs.

Checkpoints/savepoints cannot be taken if any task is no longer running, see FLINK-2491.

On 03/10/2019 06:38, Lu Niu wrote:
Hi, Fabian

Thanks for replying! 

I implemented a Custom RichInputFormat implementing CheckpointableInputFormat. And I found it is executed through InputFormatSourceFunction, which doesn't use CheckpointableInputFormat during execution. If so, how does checkpoint work here? 

I also notice when one task finished, I cannot trigger savepoint anymore. It throws exception "Not all tasks are running". Does that imply no savepoint/checkpoint can be taken once any task finish? 

Best
Lu

On Fri, Sep 6, 2019 at 6:33 AM Fabian Hueske <[hidden email]> wrote:
Hi,

CheckpointableInputFormat is only relevant if you plan to use the InputFormat in a MonitoringFileSource, i.e., in a streaming application.
If you plan to use it in a DataSet (batch) program, InputFormat is fine.

Btw. the latest release Flink 1.9.0 has major improvements for the recovery of batch jobs.

Best, Fabian

Am Do., 5. Sept. 2019 um 19:01 Uhr schrieb Lu Niu <[hidden email]>:
Hi, Team

I am implementing a custom InputFormat. Shall I implement CheckpointableInputFormat interface? If I don't, does that mean the whole job has to restart given only one task fails? I ask because I found all InputFormat implements CheckpointableInputFormat, which makes me confused. Thank you! 

Best
Lu