Support for custom triggers in Table / SQL

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for custom triggers in Table / SQL

Piyush Narang

Hi folks,

 

I’m trying to write a Flink job that computes a bunch of counters which requires custom triggers and I was trying to figure out the best way to express that.

 

The query looks something like this:

SELECT userId, customUDAF(...) AS counter1, customUDAF(...) AS counter2, ...

FROM (

SELECT * FROM my_kafka_stream

)

GROUP BY userId, HOP(`timestamp`, INTERVAL '6' HOUR, INTERVAL '7' DAY)

 

We sink this to a KV store (memcache / couchbase) for persistence.

 

Some of these counters end up spanning a pretty wide time window (longest is 7 days) and if we want to keep the state tractable we have to have a pretty large slide interval (6 hours or greater). A requirement that some of our users have is for counters to be updated fairly frequently (e.g. every min) so we were looking at how to achieve that with the Table / SQL api. I see that this is possible using the custom triggers support if we were to use the Datastream api but I’m wondering if this is possible using the Table / SQL apis.

 

I did see another thread where Fabian brought up this design doc which has listed what support for emit triggers would look like (in various streaming platforms). Is this something that is being actively worked on? If not, any suggestions on how we could get the ball rolling on this? (google doc design + jira?)

 

Thanks,

 

-- Piyush

 

Reply | Threaded
Open this post in threaded view
|

Re: Support for custom triggers in Table / SQL

Fabian Hueske-2
Hi Piyush,

Custom triggers (or early firing) is currently not supported by SQL or the Table API.
It is also not on the roadmap [1].
Currently, most efforts on the relational API are focused on restructuring the code and working towards the integration of the Blink contribution [2].

AFAIK, there are no concrete plans to work on support for early results in SQL.
You'd need to use the DataStream API for this use case.

Best, Fabian


Am Do., 28. März 2019 um 12:32 Uhr schrieb Piyush Narang <[hidden email]>:

Hi folks,

 

I’m trying to write a Flink job that computes a bunch of counters which requires custom triggers and I was trying to figure out the best way to express that.

 

The query looks something like this:

SELECT userId, customUDAF(...) AS counter1, customUDAF(...) AS counter2, ...

FROM (

SELECT * FROM my_kafka_stream

)

GROUP BY userId, HOP(`timestamp`, INTERVAL '6' HOUR, INTERVAL '7' DAY)

 

We sink this to a KV store (memcache / couchbase) for persistence.

 

Some of these counters end up spanning a pretty wide time window (longest is 7 days) and if we want to keep the state tractable we have to have a pretty large slide interval (6 hours or greater). A requirement that some of our users have is for counters to be updated fairly frequently (e.g. every min) so we were looking at how to achieve that with the Table / SQL api. I see that this is possible using the custom triggers support if we were to use the Datastream api but I’m wondering if this is possible using the Table / SQL apis.

 

I did see another thread where Fabian brought up this design doc which has listed what support for emit triggers would look like (in various streaming platforms). Is this something that is being actively worked on? If not, any suggestions on how we could get the ball rolling on this? (google doc design + jira?)

 

Thanks,

 

-- Piyush