Hi, I'm reading documentation about checkpointing: https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html It describes a case, when an operator receives data from all its incoming streams alongs with barriers. There's also an illustration on that page for the case. One thing confuses me, though. Each data stream has its own source which emits own sequence of barriers. This is very implementation specific, and the docs say that for incoming Kafka streams the connector uses offset information to produce a barrier. But the illustration and the explanation have "barrier n" in both of the incoming streams. "As soon as the operator receives snapshot barrier n from an incoming stream, it cannot process any further records from that stream until it has received the barrier n from the other inputs as well" Can you please help me to understand why "barrier n" would appear in different streams. I'd expect "barrier n" in stream1 and "barrier m" in stream2 Thank you, Alex |
barrier n appearing in all the streams serves as synchronization point. As explained in the subsequent paragraph: bq. Otherwise, it would mix records that belong to snapshot nand with records that belong to snapshot n+1. Cheers On Mon, Apr 23, 2018 at 7:21 AM, Alexander Smirnov <[hidden email]> wrote:
|
ok, I got it. Barrier-n is an indicator or n-th checkpoint.
My first impression was that barriers are carrying offset information, but it was wrong. Thanks for unblocking ;-) Alex |
Hi Alex, That's correct. The n refers to the n-th checkpoint. The checkpoint ID is important, because operators need to align the barriers to ensure that they consumed all inputs up to the point, where the barriers were injected into the stream.2018-04-24 10:47 GMT+02:00 Alexander Smirnov <[hidden email]>:
|
Free forum by Nabble | Edit this page |