Testing / Configuring event windows with Table API and SQL

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing / Configuring event windows with Table API and SQL

Colin Williams
Hello,

I've been given some flink application code and asked to implement and ensure that our query is updated for late arriving entries. We're currently creating a table using a Tumbling SQL query similar to the first example in


We then turn the result table back into a datastream using toAppendStream, and eventually add a derivative stream to a sink. We've configured TimeCharacteristic to event-time processing.

From reading the documentation I was trying to configure using withIdleStateRetentionTime, with the expectation that this setting would allow me to deal with late arrivals past a given watermark time, but within the retention time.

Then to test this I created a simple source which triggers the watermark, so that I'd have next a late arrival. However so far the watermark seems to cause something to discriminate the late arrival. Then in my test sink where I'm trying to capture all emitted outputs, and hopefully the updated value I don't find one.

So it seems that my understanding of how to deal with late events, or my test platform is wrong. Can anyone recognize what I'm doing wrong?


Best,

Colin Williams



Reply | Threaded
Open this post in threaded view
|

Re: Testing / Configuring event windows with Table API and SQL

Fabian Hueske-2
Hi Colin,

Flink's SQL runner does not support handling of late data yet. At the moment, late events are simply dropped.
We plan to add support for late data in a future release.

The "withIdleStateRetentionTime" parameter only applies to non-windowed aggregation functions and controls when they can evict state for inactive keys.

Best, Fabian


2017-11-10 4:17 GMT+01:00 Colin Williams <[hidden email]>:
Hello,

I've been given some flink application code and asked to implement and ensure that our query is updated for late arriving entries. We're currently creating a table using a Tumbling SQL query similar to the first example in


We then turn the result table back into a datastream using toAppendStream, and eventually add a derivative stream to a sink. We've configured TimeCharacteristic to event-time processing.

From reading the documentation I was trying to configure using withIdleStateRetentionTime, with the expectation that this setting would allow me to deal with late arrivals past a given watermark time, but within the retention time.

Then to test this I created a simple source which triggers the watermark, so that I'd have next a late arrival. However so far the watermark seems to cause something to discriminate the late arrival. Then in my test sink where I'm trying to capture all emitted outputs, and hopefully the updated value I don't find one.

So it seems that my understanding of how to deal with late events, or my test platform is wrong. Can anyone recognize what I'm doing wrong?


Best,

Colin Williams