How watermark is generated in sql DDL statement

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How watermark is generated in sql DDL statement

lec ssmi
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: How watermark is generated in sql DDL statement

Benchao Li
Hi lec ssmi,

It's a good question. In blink planner, we use code gen to handle watermark expression.
And in `WatermarkAssignerOperator` we calculate current watermark when each element comes in.
If the watermark - lastEmitedWatermark > watermark interval, we will emit the new watermark.

So it's neither `PeriodicWatermark` nor `PunctuatedWatermark`. 

lec ssmi <[hidden email]> 于2020年4月17日周五 下午3:12写道:
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How watermark is generated in sql DDL statement

lec ssmi
Maybe you are all right. I was  more confused  .
As the cwiki said, flink could use BoundedOutOfOrderTimestamps , 
image.png

but I have heard about WatermarkAssignerOperator from Blink developers. 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午4:33写道:
Hi lec ssmi,

It's a good question. In blink planner, we use code gen to handle watermark expression.
And in `WatermarkAssignerOperator` we calculate current watermark when each element comes in.
If the watermark - lastEmitedWatermark > watermark interval, we will emit the new watermark.

So it's neither `PeriodicWatermark` nor `PunctuatedWatermark`. 

lec ssmi <[hidden email]> 于2020年4月17日周五 下午3:12写道:
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How watermark is generated in sql DDL statement

Benchao Li
WatermarkAssignerOperator is an inner mechanism for generating watermarks.

The "Bounded Out of Orderness" is just one kind of the watermark expressions, which 
is most commonly used.

The main logic of WatermarkAssignerOperator is:
- keep currentWatermark and lastWatermark
- when each element comes in
  - get watermark from this element, using the watermark expression 
  - if the watermark > currentWatermark, then currentWatermark is updated
  - if currentWatermark - lastWatermark > watermarkInterval
    - emit watermark to downstream, and update lastWatermark

lec ssmi <[hidden email]> 于2020年4月17日周五 下午4:50写道:
Maybe you are all right. I was  more confused  .
As the cwiki said, flink could use BoundedOutOfOrderTimestamps , 
image.png

but I have heard about WatermarkAssignerOperator from Blink developers. 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午4:33写道:
Hi lec ssmi,

It's a good question. In blink planner, we use code gen to handle watermark expression.
And in `WatermarkAssignerOperator` we calculate current watermark when each element comes in.
If the watermark - lastEmitedWatermark > watermark interval, we will emit the new watermark.

So it's neither `PeriodicWatermark` nor `PunctuatedWatermark`. 

lec ssmi <[hidden email]> 于2020年4月17日周五 下午3:12写道:
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How watermark is generated in sql DDL statement

lec ssmi
I think you are all right. I have I checked the source code of WatermarkAssignerOperator, and I have found  the WatermarkGenerator  passed in  WatermarkAssignerOperator  is  the  interface WatermarkGeneratorAnd BoundedOutOfOrderWatermarkGenerator  is the only implementation class of WatermarkGenerator. By the way , the interval  is based processing time . 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午5:06写道:
WatermarkAssignerOperator is an inner mechanism for generating watermarks.

The "Bounded Out of Orderness" is just one kind of the watermark expressions, which 
is most commonly used.

The main logic of WatermarkAssignerOperator is:
- keep currentWatermark and lastWatermark
- when each element comes in
  - get watermark from this element, using the watermark expression 
  - if the watermark > currentWatermark, then currentWatermark is updated
  - if currentWatermark - lastWatermark > watermarkInterval
    - emit watermark to downstream, and update lastWatermark

lec ssmi <[hidden email]> 于2020年4月17日周五 下午4:50写道:
Maybe you are all right. I was  more confused  .
As the cwiki said, flink could use BoundedOutOfOrderTimestamps , 
image.png

but I have heard about WatermarkAssignerOperator from Blink developers. 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午4:33写道:
Hi lec ssmi,

It's a good question. In blink planner, we use code gen to handle watermark expression.
And in `WatermarkAssignerOperator` we calculate current watermark when each element comes in.
If the watermark - lastEmitedWatermark > watermark interval, we will emit the new watermark.

So it's neither `PeriodicWatermark` nor `PunctuatedWatermark`. 

lec ssmi <[hidden email]> 于2020年4月17日周五 下午3:12写道:
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How watermark is generated in sql DDL statement

Benchao Li
Actually, BoundedOutOfOrderWatermarkGenerator is only used in tests,
the real WatermarkGenerator is code generated in WatermarkGeneratorCodeGenerator

lec ssmi <[hidden email]> 于2020年4月17日周五 下午5:19写道:
I think you are all right. I have I checked the source code of WatermarkAssignerOperator, and I have found  the WatermarkGenerator  passed in  WatermarkAssignerOperator  is  the  interface WatermarkGeneratorAnd BoundedOutOfOrderWatermarkGenerator  is the only implementation class of WatermarkGenerator. By the way , the interval  is based processing time . 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午5:06写道:
WatermarkAssignerOperator is an inner mechanism for generating watermarks.

The "Bounded Out of Orderness" is just one kind of the watermark expressions, which 
is most commonly used.

The main logic of WatermarkAssignerOperator is:
- keep currentWatermark and lastWatermark
- when each element comes in
  - get watermark from this element, using the watermark expression 
  - if the watermark > currentWatermark, then currentWatermark is updated
  - if currentWatermark - lastWatermark > watermarkInterval
    - emit watermark to downstream, and update lastWatermark

lec ssmi <[hidden email]> 于2020年4月17日周五 下午4:50写道:
Maybe you are all right. I was  more confused  .
As the cwiki said, flink could use BoundedOutOfOrderTimestamps , 
image.png

but I have heard about WatermarkAssignerOperator from Blink developers. 

Benchao Li <[hidden email]> 于2020年4月17日周五 下午4:33写道:
Hi lec ssmi,

It's a good question. In blink planner, we use code gen to handle watermark expression.
And in `WatermarkAssignerOperator` we calculate current watermark when each element comes in.
If the watermark - lastEmitedWatermark > watermark interval, we will emit the new watermark.

So it's neither `PeriodicWatermark` nor `PunctuatedWatermark`. 

lec ssmi <[hidden email]> 于2020年4月17日周五 下午3:12写道:
Hi:
   In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented?
   PeriodicWatermark   or   PunctuatedWatermark
  There seems to be  no explanation on the official website.
   
  Thanks.


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]