What happens when all input partitions become idle

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

What happens when all input partitions become idle

Dongwon Kim-2
Hi,

Let's consider two operators: A (parallelism=2) and B (parallelism=1).
B has two input partitions, B_A1 and B_A2, which are connected to A1 and A2 respectively.

At some point, 
- B_A1's watermark : 12
- B_A2's watermark : 10
- B's event-time clock : 10 = min(12, 10)
- B has registered a timer at 12
- No data will be fed into the pipeline for the next few hours, but I want the timer to be fired after a few seconds if no data is coming.

After adopting a watermark strategy explained in [1], I found that the timer is fired as wished! That's awesome! 

But I want to know what happens inside in detail.
Based on my current understanding of how watermark is calculated [2], I cannot come up with what happens inside when idleness is considered.
If B_A2 is marked idle earlier than B_A1, is B's event-time clock calculated as min(12, MAX_WATERMARK)? 

Thanks,

Dongwon

Reply | Threaded
Open this post in threaded view
|

Re: What happens when all input partitions become idle

Benchao Li-2

Dongwon Kim <[hidden email]> 于2020年12月10日周四 上午12:21写道:
Hi,

Let's consider two operators: A (parallelism=2) and B (parallelism=1).
B has two input partitions, B_A1 and B_A2, which are connected to A1 and A2 respectively.

At some point, 
- B_A1's watermark : 12
- B_A2's watermark : 10
- B's event-time clock : 10 = min(12, 10)
- B has registered a timer at 12
- No data will be fed into the pipeline for the next few hours, but I want the timer to be fired after a few seconds if no data is coming.

After adopting a watermark strategy explained in [1], I found that the timer is fired as wished! That's awesome! 

But I want to know what happens inside in detail.
Based on my current understanding of how watermark is calculated [2], I cannot come up with what happens inside when idleness is considered.
If B_A2 is marked idle earlier than B_A1, is B's event-time clock calculated as min(12, MAX_WATERMARK)? 

Thanks,

Dongwon



--

Best,
Benchao Li
Reply | Threaded
Open this post in threaded view
|

Re: What happens when all input partitions become idle

Dongwon Kim-2
Hi Benchao,

Thanks for the input.

The code is self-explanatory.

Best,

Dongwon


On Thu, Dec 10, 2020 at 12:20 PM Benchao Li <[hidden email]> wrote:

Dongwon Kim <[hidden email]> 于2020年12月10日周四 上午12:21写道:
Hi,

Let's consider two operators: A (parallelism=2) and B (parallelism=1).
B has two input partitions, B_A1 and B_A2, which are connected to A1 and A2 respectively.

At some point, 
- B_A1's watermark : 12
- B_A2's watermark : 10
- B's event-time clock : 10 = min(12, 10)
- B has registered a timer at 12
- No data will be fed into the pipeline for the next few hours, but I want the timer to be fired after a few seconds if no data is coming.

After adopting a watermark strategy explained in [1], I found that the timer is fired as wished! That's awesome! 

But I want to know what happens inside in detail.
Based on my current understanding of how watermark is calculated [2], I cannot come up with what happens inside when idleness is considered.
If B_A2 is marked idle earlier than B_A1, is B's event-time clock calculated as min(12, MAX_WATERMARK)? 

Thanks,

Dongwon



--

Best,
Benchao Li