Hi,
Recently, I studied about watermark from Flink documents and blogs. I have some question about this scenario below. Suppose there are five clients sending events with different time to the topic on Kafka. Topic has two partitions and five events' timestamp are (ts=1), (ts=2), (ts=3), (ts=8), (ts=9). The Flink streaming job uses the following setting: 1. use AscendingTimestampExtractor 2. client time as timestamp 3. use tumbling window with 5 unit window size 4. not allow late event If the client events out of order like this. Partition A [(ts=1), (ts=8)] Partition B [(ts=2), (ts=9)] <= (ts=3) delay Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in state and drop out (ts=3) ? If all events has come, and then replay the job from the beginning, the partition state would be Partition A [(ts=1), (ts=8)] Partition B [(ts=2), (ts=9), (ts=3)] Suppose two consumers fetch events with same speed, should the result be the same as above? If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) be placed in the window before watermark becomes to 8 and then emit [(ts=1), (ts=2), (ts=3)] as result? I wonder if those questions are all correct. If not, is there any mechanisms about watermark and window in Flink that I missed. Thank for your help. Best Regards, Tony Wei |
Hi Tony,
I think your analyses are correct. Especially, yes, if you re-read the data the (ts=3) data should still be considered late if both consumers read with the same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) then it should not be considered late, as you said. Best, Aljoscha > On 24. Aug 2017, at 15:49, Tony Wei <[hidden email]> wrote: > > Hi, > > Recently, I studied about watermark from Flink documents and blogs. > > I have some question about this scenario below. > > Suppose there are five clients sending events with different time to the topic on Kafka. > Topic has two partitions and five events' timestamp are (ts=1), (ts=2), (ts=3), (ts=8), (ts=9). > The Flink streaming job uses the following setting: > 1. use AscendingTimestampExtractor > 2. client time as timestamp > 3. use tumbling window with 5 unit window size > 4. not allow late event > > If the client events out of order like this. > Partition A [(ts=1), (ts=8)] > Partition B [(ts=2), (ts=9)] <= (ts=3) delay > Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in state and drop out (ts=3) ? > > If all events has come, and then replay the job from the beginning, the partition state would be > Partition A [(ts=1), (ts=8)] > Partition B [(ts=2), (ts=9), (ts=3)] > Suppose two consumers fetch events with same speed, should the result be the same as above? > If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) be placed in the window before watermark becomes to 8 and then emit [(ts=1), (ts=2), (ts=3)] as result? > > I wonder if those questions are all correct. If not, is there any mechanisms about watermark and window in Flink that I missed. > > Thank for your help. > > Best Regards, > Tony Wei > |
Hi Alijoscha, It is very helpful to me to understand the behavior on such scenario. Thank you very much!!! Best Regards, Tony Wei 2017-08-28 20:00 GMT+08:00 Aljoscha Krettek <[hidden email]>: Hi Tony, |
Free forum by Nabble | Edit this page |