Hi All, How are timestamps treated within an iterative DataStream loop within Flink? For example, here is an example of a simple iterative loop within Flink where the feedback loop is of a different type to the input stream:
My questions revolve around how does Flink use timestamps within a feedback loop:
John
Question also posted to StackOverflow here: https://stackoverflow.com/questions/56506020/how-does-flink-treat-timestamps-within-iterative-loops
|
Hi John, As a whole, I think currently Flink does not have special mechanism for event-time in iteration. This means the IterationHead treats the initial input and the feedback input as two normal inputs and use the same mechanism with the tasks outside the iteration. This may cause disorder of the event-time inside the iteration. The event time relies on the watermark alignment mechanism to mark the least event-time of the following records. Suppose we have a watermark with event-time 10, The iteration head will first receive the watermark from the initial input, and then receive it again from the feedback input after the first round of iteration. Then IterationHead will think the watermarks have aligned at the event-time 10, so it will emits the watermark with event-time 10 to the final output, which means that it will not receive and emit records whose event-time is less than 10. However, since the records may iterate multiple rounds, the IterationHead may still receive the records whose event-time is less than 10 again in the following rounds of iteration. Then the disorder of the event-time occurs. Best, Yun Gao
|
Free forum by Nabble | Edit this page |