Re: Fink: KafkaProducer Data Loss

Posted by ninad on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p14522.html

Hi Gordon, I was able to reproduce the data loss on standalone flink cluster also. I have attached a stripped down version of our code here:

Environment:
Flink standalone 1.3.0
Kafka 0.9

What the code is doing:
-consume messages from kafka topic ('event.filter.topic' property in application.properties)
-group them by key
-analyze the events in a window and filter some messages.
-send remaining messages to kafka topc ('sep.http.topic' property in application.properties)

To build:
./gradlew clean assemble

The jar needs path to 'application.properties' file to run

Important properties in application.properties:
window.session.interval.sec
kafka.brokers
event.filter.topic --> source topic
sep.http.topic --> destination topic

To test:
-Use 'EventGenerator' class to publish messages to source kafka topic
        The data published won't be filtered by the logic. If you publish 10 messages to the source topic,
        those 10 messages should be sent to the destination topic.

-Once we  see that flink has received all the messages, bring down all kafka brokers

-Let Flink jobs fail for 2-3 times.

-Restart kafka brokers.

Note: Data loss isn't observed frequently. 1/4 times or so.

Thanks for all your help.

eventFilter.zip