Checkpointing barriers

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing barriers

Alexander Smirnov
Hi,

It describes a case, when an operator receives data from all its incoming streams alongs with barriers. There's also an illustration on that page for the case.

One thing confuses me, though.

Each data stream has its own source which emits own sequence of barriers. This is very implementation specific, and the docs say that for incoming Kafka streams the connector uses offset information to produce a barrier.
But the illustration and the explanation have "barrier n" in both of the incoming streams.

"As soon as the operator receives snapshot barrier n from an incoming stream, it cannot process any further records from that stream until it has received the barrier n from the other inputs as well"

Can you please help me to understand why "barrier n" would appear in different streams.
I'd expect "barrier n" in stream1 and "barrier m" in stream2

Thank you,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing barriers

Ted Yu
barrier n appearing in all the streams serves as synchronization point.

As explained in the subsequent paragraph:

bq. Otherwise, it would mix records that belong to snapshot nand with records that belong to snapshot n+1.

Cheers

On Mon, Apr 23, 2018 at 7:21 AM, Alexander Smirnov <[hidden email]> wrote:
Hi,

It describes a case, when an operator receives data from all its incoming streams alongs with barriers. There's also an illustration on that page for the case.

One thing confuses me, though.

Each data stream has its own source which emits own sequence of barriers. This is very implementation specific, and the docs say that for incoming Kafka streams the connector uses offset information to produce a barrier.
But the illustration and the explanation have "barrier n" in both of the incoming streams.

"As soon as the operator receives snapshot barrier n from an incoming stream, it cannot process any further records from that stream until it has received the barrier n from the other inputs as well"

Can you please help me to understand why "barrier n" would appear in different streams.
I'd expect "barrier n" in stream1 and "barrier m" in stream2

Thank you,
Alex

Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing barriers

Alexander Smirnov
ok, I got it. Barrier-n is an indicator or n-th checkpoint.

My first impression was that barriers are carrying offset information, but it was wrong.

Thanks for unblocking ;-)

Alex
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing barriers

Fabian Hueske-2
Hi Alex,

That's correct. The n refers to the n-th checkpoint. The checkpoint ID is important, because operators need to align the barriers to ensure that they consumed all inputs up to the point, where the barriers were injected into the stream.
Each operator checkpoints its own state. For sources, this could be the reading offset in a Kafka topic, or path and the byte offset in a file, etc.

Cheers, Fabian

2018-04-24 10:47 GMT+02:00 Alexander Smirnov <[hidden email]>:
ok, I got it. Barrier-n is an indicator or n-th checkpoint.

My first impression was that barriers are carrying offset information, but it was wrong.

Thanks for unblocking ;-)

Alex