Re: Duplicated data when using Externalized Checkpoints in a Flink Highly Available cluster
Posted by F.Amara on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Duplicated-data-when-using-Externalized-Checkpoints-in-a-Flink-Highly-Available-cluster-tp13301p13379.html
Hi Gordan,
Thanks alot for the reply.
The events are produced using a KafkaProducer, submitted to a topic and thereby consumed by the Flink application using a FlinkKafkaConsumer. I verified that during a failure recovery scenario(of the Flink application) the KafkaProducer was not interrupted, resulting in not sending duplicated values from the data source. I observed the output from the FlinkKafkaConsumer and noticed duplicates starting from that point onwards. Is the FlinkKafkaConsumer capable of intoducing duplicates?
How can I implement exactly-once processing for my application? Could you please guide me on what I might have missed?
Thanks,
Amara