Hi Desheng,
Welcome to the community!
What you’re asking alludes the question: How does Flink support end-to-end (from external source to external sink, e.g. Kafka to database) exactly-once delivery?
Whether or not that is supported depends on the guarantees of the source and sink and how they work with Flink’s checkpointing; see the overview of the guarantees here [1].
I think you already understand how sources work with checkpoints for exactly-once.
For sinks to be exactly-once, the external system needs to be able to participate in the checkpointing mechanism, so it depends on what the sink is.
The participation is usually in some form of transaction, which is only committed once a job's checkpoint is fully completed.
If the sink doesn’t support this, you can also achieve effective end-to-end exactly-once by making your database writes / updates idempotent, so that replaying the stream since t1 does not affect the end results.
Cheers,
Gordon
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/guarantees.html
On 27 June 2017 at 11:36:00 AM, ZalaCheung ([hidden email]) wrote:
Hi Flink Community,
I am new to Flink and now looking at checkpoint of
Flink.
After reading the document, I am still confused. Here is
scene:
I have a datastream finally flow to a database sink. I
will update one of the field in database based on the incomming
stream. I have now complete a snapshot, say t1, and snapshot t2 is
on progress. After snapshot t1 complete, I update my database
several times. Then BEFORE snapshot t2 complete, my task
fail.
Based on my understanding of checkpoint on Flink, I will
recover from snapshot t1 and re-execute the Streams between t1 and
t2. But I've already update my database several times after t1.
Does flink deal with this issue?
Best,
Desheng Zhang