This is just a question about a potential use case for Flink: I have a Flink job which receives tuples with an event id and a timestamp (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp (up to 1 year in the future, which indicates when to schedule a transformation of e). I then want to key by e and keep track of the max t2 for each e. Now the tricky bit: I want to periodically, say every minute (in event time world) take all (e, t2) where t2 occurred in the last minute, do a transformation and emit the result. It is important that the final transformation happens after t2 (preferably as soon as possible, but a delay of minutes is fine). Is it possible to use Flink's windowing and watermark mechanics to achieve this? I want to maintain a large state for the (e, t2) window, e.g. over a year (probably too large to fit in memory). And somehow use watermarks to execute the scheduled transformations. If anyone has any views on how this could be done, (or whether it's even possible/a good idea to do) with Flink then it would be great to hear! Thanks, Josh |
Hi Josh, I'll have to think a bit about that one. Once I have something I'll get back to you. Best, Aljoscha On Wed, 8 Jun 2016 at 21:47 Josh <[hidden email]> wrote:
|
Ok, thanks Aljoscha. As an alternative to using Flink to maintain the schedule state, I could take the (e, t2) stream and write to a external key-value store with a bucket for each minute. Then have a separate service which polls the key-value store every minute and retrieves the current bucket, and does the final transformation. I just thought there might be a nicer way to do it using Flink! On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |