Flink Checkpointing in production

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Checkpointing in production

hassahma
Hi All, 

We need two clarifications for using Flink 1.6.0. We have flink jobs running to handle 100's of tenants with sliding window of 24hrs and slide by 5 minutes.

1) If checkpointing is enabled and flink job crashes in the middle of spitting out results to kafka producer. Then if the job resumes from the previously known checkpoint, then what happens to the partial results that kafka producer has already sent out. Do we end up sending duplicates for the same window after recovery from checkpoint. Or this window never gets triggered again as we have forward in time ?

2) If one input event flink, produces 100 messages out in kafka producer after window expiry, then how will setFlushOnCheckpoint will work in kafka producer. I am confused how will it ensure that all records before checkpoint have been sent out because we are creating 100 output events from single input event.

Thanks for the help.

Best Regards,
Reply | Threaded
Open this post in threaded view
|

Re: Flink Checkpointing in production

vino yang
Hi Ahmad,

Answer your question:

Ahmad Hassan <[hidden email]> 于2018年9月12日周三 下午4:39写道:
Hi All, 

We need two clarifications for using Flink 1.6.0. We have flink jobs running to handle 100's of tenants with sliding window of 24hrs and slide by 5 minutes.

1) If checkpointing is enabled and flink job crashes in the middle of spitting out results to kafka producer. Then if the job resumes from the previously known checkpoint, then what happens to the partial results that kafka producer has already sent out. Do we end up sending duplicates for the same window after recovery from checkpoint. Or this window never gets triggered again as we have forward in time ?

Which version of Kafka do you use? Kafka has supported producer transactions since version 0.11. If you use this version of the connector and choose the checkpoint mode to be exact-once, the entire pipeline will guarantee end-to-end consistency.
 

2) If one input event flink, produces 100 messages out in kafka producer after window expiry, then how will setFlushOnCheckpoint will work in kafka producer. I am confused how will it ensure that all records before checkpoint have been sent out because we are creating 100 output events from single input event.

After the first reply, if your source connector can also guarantee the end-to-end exact once semantics, then the whole process is no problem. Because Flink's checkpoint mechanism is a snapshot of global consistency. It is not just an independent snapshot of a particular operator.
 

Thanks for the help.

Best Regards,