Guarantee of event-time order in FlinkKafkaConsumer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Guarantee of event-time order in FlinkKafkaConsumer

Wojciech Indyk
Hi!
I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order of events by event timestamp. I generate periodic watermarks every 1s. I use FlinkKafkaConsumer with AscendingTimestampExtractor.
The code (and the same question) is here: https://stackoverflow.com/questions/58539379/guarantee-of-event-time-order-in-flinkkafkaconsumer

I realized, that for unordered events, that came in the same ms or a few ms later, the order is not corrected by Flink. What I found in the docs: "the watermark triggers computation of all windows where the maximum timestamp (which is end-timestamp - 1) is smaller than the new watermark", so I added a step of timeWindowAll with size of 100ms and inside that window I sort messages by the event timestamp. It works, but I find this solution ugly and it looks like a workaround. I am also concerned about per-partition watermarks of KafkaSource.

Ideally I would like to put the guarantee of order in the KafkaSource and keep it for each kafka partition, like per-partition watermarks. Is it possible to do so? What is the current best solution for guarantee the event-time order of events in Flink?

--
Kind regards/ Pozdrawiam,
Wojciech Indyk
Reply | Threaded
Open this post in threaded view
|

Re: Guarantee of event-time order in FlinkKafkaConsumer

Fabian Hueske-2
Hi Wojciech,

I posted an answer on StackOverflow.

Best, Fabian

Am Do., 24. Okt. 2019 um 13:03 Uhr schrieb Wojciech Indyk <[hidden email]>:
Hi!
I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order of events by event timestamp. I generate periodic watermarks every 1s. I use FlinkKafkaConsumer with AscendingTimestampExtractor.
The code (and the same question) is here: https://stackoverflow.com/questions/58539379/guarantee-of-event-time-order-in-flinkkafkaconsumer

I realized, that for unordered events, that came in the same ms or a few ms later, the order is not corrected by Flink. What I found in the docs: "the watermark triggers computation of all windows where the maximum timestamp (which is end-timestamp - 1) is smaller than the new watermark", so I added a step of timeWindowAll with size of 100ms and inside that window I sort messages by the event timestamp. It works, but I find this solution ugly and it looks like a workaround. I am also concerned about per-partition watermarks of KafkaSource.

Ideally I would like to put the guarantee of order in the KafkaSource and keep it for each kafka partition, like per-partition watermarks. Is it possible to do so? What is the current best solution for guarantee the event-time order of events in Flink?

--
Kind regards/ Pozdrawiam,
Wojciech Indyk