Re: Duplicated data when using Externalized Checkpoints in a Flink Highly Available cluster
Posted by
rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Duplicated-data-when-using-Externalized-Checkpoints-in-a-Flink-Highly-Available-cluster-tp13301p13433.html
HiĀ
Amara,how are you validating if you have duplicates in your output or not?
If you are just writing the output to another Kafka topic or print it to standard out, you'll see duplicates even if exactly once works.
Flink does not provide exactly once delivery. Flink has exactly once semantics for registered state.
This means you need to cooperate with the system to achieve exactly once. For example for files, you need to remove invalid data from previous failed checkpoints. Our bucketing sink is doing that.