pipeline.auto-watermark-interval vs setAutoWatermarkInterval

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Aeden Jameson
I'm hoping to have my confusion clarified regarding the settings,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-

2. setAutoWatermarkInterval
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-

I noticed the default value of pipeline.auto-watermark-interval is 0
and according to these docs,
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
it states, "If watermark interval is 0ms, the generated watermarks
will be emitted per-record if it is not null and greater than the last
emitted one." However in the documentation for
setAutoWatermarkInterval the value 0 disables watermark emission.

* Are they intended to be the same setting? If not how are they
different? Is one for FlinkSql and the other DataStream API?

--
Thank you,
Aeden
Reply | Threaded
Open this post in threaded view
|

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Aeden Jameson
Correction: The first link was supposed to be,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval

On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson <[hidden email]> wrote:

>
> I'm hoping to have my confusion clarified regarding the settings,
>
> 1. pipeline.auto-watermark-interval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> 2. setAutoWatermarkInterval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> I noticed the default value of pipeline.auto-watermark-interval is 0
> and according to these docs,
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
> it states, "If watermark interval is 0ms, the generated watermarks
> will be emitted per-record if it is not null and greater than the last
> emitted one." However in the documentation for
> setAutoWatermarkInterval the value 0 disables watermark emission.
>
> * Are they intended to be the same setting? If not how are they
> different? Is one for FlinkSql and the other DataStream API?
>
> --
> Thank you,
> Aeden
Reply | Threaded
Open this post in threaded view
|

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Matthias
Hi Aeden,
sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this.

Thanks,
Matthias


On Thu, Mar 4, 2021 at 9:53 PM Aeden Jameson <[hidden email]> wrote:
Correction: The first link was supposed to be,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval

On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson <[hidden email]> wrote:
>
> I'm hoping to have my confusion clarified regarding the settings,
>
> 1. pipeline.auto-watermark-interval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> 2. setAutoWatermarkInterval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> I noticed the default value of pipeline.auto-watermark-interval is 0
> and according to these docs,
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
> it states, "If watermark interval is 0ms, the generated watermarks
> will be emitted per-record if it is not null and greater than the last
> emitted one." However in the documentation for
> setAutoWatermarkInterval the value 0 disables watermark emission.
>
> * Are they intended to be the same setting? If not how are they
> different? Is one for FlinkSql and the other DataStream API?
>
> --
> Thank you,
> Aeden
Reply | Threaded
Open this post in threaded view
|

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Dawid Wysakowicz-2

Hey,

I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same way as DataStream APi. The documentation you linked, Aeden, describes the SQL API.

@Jark @Timo Could you verify if the SQL documentation is correct?

Best,

Dawid

On 23/03/2021 15:20, Matthias Pohl wrote:
Hi Aeden,
sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this.

Thanks,
Matthias


On Thu, Mar 4, 2021 at 9:53 PM Aeden Jameson <[hidden email]> wrote:
Correction: The first link was supposed to be,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval

On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson <[hidden email]> wrote:
>
> I'm hoping to have my confusion clarified regarding the settings,
>
> 1. pipeline.auto-watermark-interval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> 2. setAutoWatermarkInterval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> I noticed the default value of pipeline.auto-watermark-interval is 0
> and according to these docs,
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
> it states, "If watermark interval is 0ms, the generated watermarks
> will be emitted per-record if it is not null and greater than the last
> emitted one." However in the documentation for
> setAutoWatermarkInterval the value 0 disables watermark emission.
>
> * Are they intended to be the same setting? If not how are they
> different? Is one for FlinkSql and the other DataStream API?
>
> --
> Thank you,
> Aeden

OpenPGP_signature (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Jark Wu-3
IIUC, pipeline.auto-watermak-interval = 0 just disable **periodic** watermark emission,
 it doesn't mean the watermark will never be emitted. 
In Table API/SQL, it has the same meaning. If watermark interval = 0, we disable periodic watermark emission,
and emit watermark once it advances. 

So I think the SQL documentation is correct. 

Best,
Jark

On Tue, 23 Mar 2021 at 22:29, Dawid Wysakowicz <[hidden email]> wrote:

Hey,

I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same way as DataStream APi. The documentation you linked, Aeden, describes the SQL API.

@Jark @Timo Could you verify if the SQL documentation is correct?

Best,

Dawid

On 23/03/2021 15:20, Matthias Pohl wrote:
Hi Aeden,
sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this.

Thanks,
Matthias


On Thu, Mar 4, 2021 at 9:53 PM Aeden Jameson <[hidden email]> wrote:
Correction: The first link was supposed to be,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval

On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson <[hidden email]> wrote:
>
> I'm hoping to have my confusion clarified regarding the settings,
>
> 1. pipeline.auto-watermark-interval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> 2. setAutoWatermarkInterval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> I noticed the default value of pipeline.auto-watermark-interval is 0
> and according to these docs,
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
> it states, "If watermark interval is 0ms, the generated watermarks
> will be emitted per-record if it is not null and greater than the last
> emitted one." However in the documentation for
> setAutoWatermarkInterval the value 0 disables watermark emission.
>
> * Are they intended to be the same setting? If not how are they
> different? Is one for FlinkSql and the other DataStream API?
>
> --
> Thank you,
> Aeden
Reply | Threaded
Open this post in threaded view
|

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

Matthias
Thanks for double-checking Dawid and thanks for clarifying, Jark. I will leave the Jira issue open as Jark suggested improving the documentation in that sense.

Best,
Matthias

On Fri, Mar 26, 2021 at 7:43 AM Jark Wu <[hidden email]> wrote:
IIUC, pipeline.auto-watermak-interval = 0 just disable **periodic** watermark emission,
 it doesn't mean the watermark will never be emitted. 
In Table API/SQL, it has the same meaning. If watermark interval = 0, we disable periodic watermark emission,
and emit watermark once it advances. 

So I think the SQL documentation is correct. 

Best,
Jark

On Tue, 23 Mar 2021 at 22:29, Dawid Wysakowicz <[hidden email]> wrote:

Hey,

I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same way as DataStream APi. The documentation you linked, Aeden, describes the SQL API.

@Jark @Timo Could you verify if the SQL documentation is correct?

Best,

Dawid

On 23/03/2021 15:20, Matthias Pohl wrote:
Hi Aeden,
sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this.

Thanks,
Matthias


On Thu, Mar 4, 2021 at 9:53 PM Aeden Jameson <[hidden email]> wrote:
Correction: The first link was supposed to be,

1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval

On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson <[hidden email]> wrote:
>
> I'm hoping to have my confusion clarified regarding the settings,
>
> 1. pipeline.auto-watermark-interval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> 2. setAutoWatermarkInterval
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
>
> I noticed the default value of pipeline.auto-watermark-interval is 0
> and according to these docs,
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/create.html#watermark,
> it states, "If watermark interval is 0ms, the generated watermarks
> will be emitted per-record if it is not null and greater than the last
> emitted one." However in the documentation for
> setAutoWatermarkInterval the value 0 disables watermark emission.
>
> * Are they intended to be the same setting? If not how are they
> different? Is one for FlinkSql and the other DataStream API?
>
> --
> Thank you,
> Aeden