Using Flink watermarks and a large window state for scheduling

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Flink watermarks and a large window state for scheduling

Josh

This is just a question about a potential use case for Flink:

I have a Flink job which receives tuples with an event id and a timestamp (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp (up to 1 year in the future, which indicates when to schedule a transformation of e). I then want to key by e and keep track of the max t2 for each e. Now the tricky bit: I want to periodically, say every minute (in event time world) take all (e, t2) where t2 occurred in the last minute, do a transformation and emit the result. It is important that the final transformation happens after t2 (preferably as soon as possible, but a delay of minutes is fine).

Is it possible to use Flink's windowing and watermark mechanics to achieve this? I want to maintain a large state for the (e, t2) window, e.g. over a year (probably too large to fit in memory). And somehow use watermarks to execute the scheduled transformations. 

If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear!

Thanks,

Josh

Reply | Threaded
Open this post in threaded view
|

Re: Using Flink watermarks and a large window state for scheduling

Aljoscha Krettek
Hi Josh,
I'll have to think a bit about that one. Once I have something I'll get back to you.

Best,
Aljoscha

On Wed, 8 Jun 2016 at 21:47 Josh <[hidden email]> wrote:

This is just a question about a potential use case for Flink:

I have a Flink job which receives tuples with an event id and a timestamp (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp (up to 1 year in the future, which indicates when to schedule a transformation of e). I then want to key by e and keep track of the max t2 for each e. Now the tricky bit: I want to periodically, say every minute (in event time world) take all (e, t2) where t2 occurred in the last minute, do a transformation and emit the result. It is important that the final transformation happens after t2 (preferably as soon as possible, but a delay of minutes is fine).

Is it possible to use Flink's windowing and watermark mechanics to achieve this? I want to maintain a large state for the (e, t2) window, e.g. over a year (probably too large to fit in memory). And somehow use watermarks to execute the scheduled transformations. 

If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear!

Thanks,

Josh

Reply | Threaded
Open this post in threaded view
|

Re: Using Flink watermarks and a large window state for scheduling

Josh
Ok, thanks Aljoscha.

As an alternative to using Flink to maintain the schedule state, I could take the (e, t2) stream and write to a external key-value store with a bucket for each minute. Then have a separate service which polls the key-value store every minute and retrieves the current bucket, and does the final transformation.

I just thought there might be a nicer way to do it using Flink!

On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek <[hidden email]> wrote:
Hi Josh,
I'll have to think a bit about that one. Once I have something I'll get back to you.

Best,
Aljoscha

On Wed, 8 Jun 2016 at 21:47 Josh <[hidden email]> wrote:

This is just a question about a potential use case for Flink:

I have a Flink job which receives tuples with an event id and a timestamp (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp (up to 1 year in the future, which indicates when to schedule a transformation of e). I then want to key by e and keep track of the max t2 for each e. Now the tricky bit: I want to periodically, say every minute (in event time world) take all (e, t2) where t2 occurred in the last minute, do a transformation and emit the result. It is important that the final transformation happens after t2 (preferably as soon as possible, but a delay of minutes is fine).

Is it possible to use Flink's windowing and watermark mechanics to achieve this? I want to maintain a large state for the (e, t2) window, e.g. over a year (probably too large to fit in memory). And somehow use watermarks to execute the scheduled transformations. 

If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear!

Thanks,

Josh