http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Question-about-watermark-and-window-tp15111p15189.html
I think your analyses are correct. Especially, yes, if you re-read the data the (ts=3) data should still be considered late if both consumers read with the same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) then it should not be considered late, as you said.
> On 24. Aug 2017, at 15:49, Tony Wei <
[hidden email]> wrote:
>
> Hi,
>
> Recently, I studied about watermark from Flink documents and blogs.
>
> I have some question about this scenario below.
>
> Suppose there are five clients sending events with different time to the topic on Kafka.
> Topic has two partitions and five events' timestamp are (ts=1), (ts=2), (ts=3), (ts=8), (ts=9).
> The Flink streaming job uses the following setting:
> 1. use AscendingTimestampExtractor
> 2. client time as timestamp
> 3. use tumbling window with 5 unit window size
> 4. not allow late event
>
> If the client events out of order like this.
> Partition A [(ts=1), (ts=8)]
> Partition B [(ts=2), (ts=9)] <= (ts=3) delay
> Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in state and drop out (ts=3) ?
>
> If all events has come, and then replay the job from the beginning, the partition state would be
> Partition A [(ts=1), (ts=8)]
> Partition B [(ts=2), (ts=9), (ts=3)]
> Suppose two consumers fetch events with same speed, should the result be the same as above?
> If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) be placed in the window before watermark becomes to 8 and then emit [(ts=1), (ts=2), (ts=3)] as result?
>
> I wonder if those questions are all correct. If not, is there any mechanisms about watermark and window in Flink that I missed.
>
> Thank for your help.
>
> Best Regards,
> Tony Wei
>