Apache Flink - A question about Tables API and SQL interfaces

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - A question about Tables API and SQL interfaces

M Singh
Hey Folks:

I have the following questions regarding Table API/SQL in streaming mode:

1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?

If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.

Thanks for your help.

Mans




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

austin.ce
Hi Mans,

I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]

For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].

Hope that helps,
Austin


On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
Hey Folks:

I have the following questions regarding Table API/SQL in streaming mode:

1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?

If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.

Thanks for your help.

Mans




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

M Singh
Thanks Austin for your helpful references.

I did take a look at [2]/[3] - but did not find anything relevant on searching for string 'late' (for allowed lateness etc) or side output.  So from my understanding the late events will be dropped if I am using Table API or SQL and the only option is to use datastream interface.  Please let me know if I missed anything.

Thanks again.


On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards <[hidden email]> wrote:


Hi Mans,

I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]

For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].

Hope that helps,
Austin


On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
Hey Folks:

I have the following questions regarding Table API/SQL in streaming mode:

1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?

If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.

Thanks for your help.

Mans




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

austin.ce
Hi Mans,

There are currently no public APIs for doing so, though if you're willing to deal with some breaking changes there are some experimental config options for late events in the Table API and SQL, seen in the WIndowEmitStrategy class[1].

Best,
Austin


On Wed, May 12, 2021 at 5:12 PM M Singh <[hidden email]> wrote:
Thanks Austin for your helpful references.

I did take a look at [2]/[3] - but did not find anything relevant on searching for string 'late' (for allowed lateness etc) or side output.  So from my understanding the late events will be dropped if I am using Table API or SQL and the only option is to use datastream interface.  Please let me know if I missed anything.

Thanks again.


On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards <[hidden email]> wrote:


Hi Mans,

I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]

For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].

Hope that helps,
Austin


On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
Hey Folks:

I have the following questions regarding Table API/SQL in streaming mode:

1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?

If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.

Thanks for your help.

Mans




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

JING ZHANG
Hi Mans,
     +1 for Austin's reply.
     I would like to add something about "allow lateness".
     After introduce Windowing table-valued function in Flink 1.13,
User could use two SQL solution to do window aggregate. And 'allow
lateness' behavior is different in these two solutions.
    1. If adopt windowing tvf window aggregate [2], 'allow lateness'
is not supported yet.
    2. If adopt legacy Grouped Window Functions [1], 'allow lateness'
is supported. However, you should use the feature with caution since
it depends on state retention configuration (`table.exec.state.ttl`
[3]), especially if a job contains many operator except for window
aggregate. Please see JIRA-21301 [4]. It maybe could be resolved in
Flink-1.14.

Best,
beyond1920

[1]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-agg/#group-window-aggregation-deprecated
[2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-tvf/
[3]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/config/
[4]:https://issues.apache.org/jira/browse/FLINK-21301

Austin Cawley-Edwards <[hidden email]> 于2021年5月13日周四 下午9:57写道:

>
> Hi Mans,
>
> There are currently no public APIs for doing so, though if you're willing to deal with some breaking changes there are some experimental config options for late events in the Table API and SQL, seen in the WIndowEmitStrategy class[1].
>
> Best,
> Austin
>
> [1]: https://github.com/apache/flink/blob/607919c6ea6983ae5ad3f82d63b7d6455c73d225/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L173-L211
>
> On Wed, May 12, 2021 at 5:12 PM M Singh <[hidden email]> wrote:
>>
>> Thanks Austin for your helpful references.
>>
>> I did take a look at [2]/[3] - but did not find anything relevant on searching for string 'late' (for allowed lateness etc) or side output.  So from my understanding the late events will be dropped if I am using Table API or SQL and the only option is to use datastream interface.  Please let me know if I missed anything.
>>
>> Thanks again.
>>
>>
>> On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards <[hidden email]> wrote:
>>
>>
>> Hi Mans,
>>
>> I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]
>>
>> For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].
>>
>> Hope that helps,
>> Austin
>>
>> [1]: https://flink.apache.org/news/2021/05/03/release-1.13.0.html#improved-interoperability-between-datastream-api-and-table-apisql
>> [2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/tableapi/#groupby-window-aggregation
>> [3]: https://github.com/ververica/flink-sql-cookbook#aggregations-and-analytics
>> [4]: https://github.com/ververica/sql-training/wiki/Queries-and-Time
>>
>> On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
>>
>> Hey Folks:
>>
>> I have the following questions regarding Table API/SQL in streaming mode:
>>
>> 1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
>> 2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?
>>
>> If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.
>>
>> Thanks for your help.
>>
>> Mans
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

Theo
Hi Mans,

Regarding to your first question: I bookmarked the following mailing list discussion a while ago [1].

Fabian Hueske as one of the major contributors to Flink answered that there aren't yet any trigger semantics in Flink SQL, but linked a great idea with a SQL extension of "EMIT".

I read each Flink release notes and hope this idea is going to be implemented, but as far as I know, there wasn't any progress on this over the last years.

Best regards
Theo


[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-Do-Not-Support-Custom-Trigger-td20932.html 

----- Ursprüngliche Mail -----
Von: "张静" <[hidden email]>
An: "Austin Cawley-Edwards" <[hidden email]>
CC: "M Singh" <[hidden email]>, "user" <[hidden email]>
Gesendet: Freitag, 14. Mai 2021 06:06:33
Betreff: Re: Apache Flink - A question about Tables API and SQL interfaces

Hi Mans,
     +1 for Austin's reply.
     I would like to add something about "allow lateness".
     After introduce Windowing table-valued function in Flink 1.13,
User could use two SQL solution to do window aggregate. And 'allow
lateness' behavior is different in these two solutions.
    1. If adopt windowing tvf window aggregate [2], 'allow lateness'
is not supported yet.
    2. If adopt legacy Grouped Window Functions [1], 'allow lateness'
is supported. However, you should use the feature with caution since
it depends on state retention configuration (`table.exec.state.ttl`
[3]), especially if a job contains many operator except for window
aggregate. Please see JIRA-21301 [4]. It maybe could be resolved in
Flink-1.14.

Best,
beyond1920

[1]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-agg/#group-window-aggregation-deprecated
[2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-tvf/
[3]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/config/
[4]:https://issues.apache.org/jira/browse/FLINK-21301

Austin Cawley-Edwards <[hidden email]> 于2021年5月13日周四 下午9:57写道:

>
> Hi Mans,
>
> There are currently no public APIs for doing so, though if you're willing to deal with some breaking changes there are some experimental config options for late events in the Table API and SQL, seen in the WIndowEmitStrategy class[1].
>
> Best,
> Austin
>
> [1]: https://github.com/apache/flink/blob/607919c6ea6983ae5ad3f82d63b7d6455c73d225/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L173-L211
>
> On Wed, May 12, 2021 at 5:12 PM M Singh <[hidden email]> wrote:
>>
>> Thanks Austin for your helpful references.
>>
>> I did take a look at [2]/[3] - but did not find anything relevant on searching for string 'late' (for allowed lateness etc) or side output.  So from my understanding the late events will be dropped if I am using Table API or SQL and the only option is to use datastream interface.  Please let me know if I missed anything.
>>
>> Thanks again.
>>
>>
>> On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards <[hidden email]> wrote:
>>
>>
>> Hi Mans,
>>
>> I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]
>>
>> For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].
>>
>> Hope that helps,
>> Austin
>>
>> [1]: https://flink.apache.org/news/2021/05/03/release-1.13.0.html#improved-interoperability-between-datastream-api-and-table-apisql
>> [2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/tableapi/#groupby-window-aggregation
>> [3]: https://github.com/ververica/flink-sql-cookbook#aggregations-and-analytics
>> [4]: https://github.com/ververica/sql-training/wiki/Queries-and-Time
>>
>> On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
>>
>> Hey Folks:
>>
>> I have the following questions regarding Table API/SQL in streaming mode:
>>
>> 1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
>> 2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?
>>
>> If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.
>>
>> Thanks for your help.
>>
>> Mans
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - A question about Tables API and SQL interfaces

Ingo Bürk
Hi everyone,

there is also [1] to introduce a CURRENT_WATERMARK function in SQL which can help in dealing with late events. Maybe that's interesting here as well.



Regards
Ingo

On Sun, May 30, 2021 at 5:31 PM Theo Diefenthal <[hidden email]> wrote:
Hi Mans,

Regarding to your first question: I bookmarked the following mailing list discussion a while ago [1].

Fabian Hueske as one of the major contributors to Flink answered that there aren't yet any trigger semantics in Flink SQL, but linked a great idea with a SQL extension of "EMIT".

I read each Flink release notes and hope this idea is going to be implemented, but as far as I know, there wasn't any progress on this over the last years.

Best regards
Theo


[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-Do-Not-Support-Custom-Trigger-td20932.html

----- Ursprüngliche Mail -----
Von: "张静" <[hidden email]>
An: "Austin Cawley-Edwards" <[hidden email]>
CC: "M Singh" <[hidden email]>, "user" <[hidden email]>
Gesendet: Freitag, 14. Mai 2021 06:06:33
Betreff: Re: Apache Flink - A question about Tables API and SQL interfaces

Hi Mans,
     +1 for Austin's reply.
     I would like to add something about "allow lateness".
     After introduce Windowing table-valued function in Flink 1.13,
User could use two SQL solution to do window aggregate. And 'allow
lateness' behavior is different in these two solutions.
    1. If adopt windowing tvf window aggregate [2], 'allow lateness'
is not supported yet.
    2. If adopt legacy Grouped Window Functions [1], 'allow lateness'
is supported. However, you should use the feature with caution since
it depends on state retention configuration (`table.exec.state.ttl`
[3]), especially if a job contains many operator except for window
aggregate. Please see JIRA-21301 [4]. It maybe could be resolved in
Flink-1.14.

Best,
beyond1920

[1]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-agg/#group-window-aggregation-deprecated
[2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-tvf/
[3]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/config/
[4]:https://issues.apache.org/jira/browse/FLINK-21301

Austin Cawley-Edwards <[hidden email]> 于2021年5月13日周四 下午9:57写道:
>
> Hi Mans,
>
> There are currently no public APIs for doing so, though if you're willing to deal with some breaking changes there are some experimental config options for late events in the Table API and SQL, seen in the WIndowEmitStrategy class[1].
>
> Best,
> Austin
>
> [1]: https://github.com/apache/flink/blob/607919c6ea6983ae5ad3f82d63b7d6455c73d225/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L173-L211
>
> On Wed, May 12, 2021 at 5:12 PM M Singh <[hidden email]> wrote:
>>
>> Thanks Austin for your helpful references.
>>
>> I did take a look at [2]/[3] - but did not find anything relevant on searching for string 'late' (for allowed lateness etc) or side output.  So from my understanding the late events will be dropped if I am using Table API or SQL and the only option is to use datastream interface.  Please let me know if I missed anything.
>>
>> Thanks again.
>>
>>
>> On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards <[hidden email]> wrote:
>>
>>
>> Hi Mans,
>>
>> I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API, and back again. [1]
>>
>> For working with time and lateness with Table API and SQL, some good places to look are the GroupBy Window Aggregation section of the Table API docs[2], as well as the SQL cookbook[3] and Ververica's SQL training wiki[4].
>>
>> Hope that helps,
>> Austin
>>
>> [1]: https://flink.apache.org/news/2021/05/03/release-1.13.0.html#improved-interoperability-between-datastream-api-and-table-apisql
>> [2]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/tableapi/#groupby-window-aggregation
>> [3]: https://github.com/ververica/flink-sql-cookbook#aggregations-and-analytics
>> [4]: https://github.com/ververica/sql-training/wiki/Queries-and-Time
>>
>> On Wed, May 12, 2021 at 1:30 PM M Singh <[hidden email]> wrote:
>>
>> Hey Folks:
>>
>> I have the following questions regarding Table API/SQL in streaming mode:
>>
>> 1. Is there is a notion triggers/evictors/timers when using Table API or SQL interfaces ?
>> 2. Is there anything like side outputs and ability to define allowed lateness when dealing with the Table API or SQL interfaces ?
>>
>> If there are any alternate ways for the above when using Table API or SQL, please let me know where I can find the relevant documentation/examples.
>>
>> Thanks for your help.
>>
>> Mans
>>
>>
>>
>>