Re: Watermarks per key
Posted by
Aljoscha Krettek on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Watermarks-per-key-tp11628p11769.html
Hi,
managing a per-key watermark would require keeping to current watermark for each key, for example at the sources or in a timestamp/watermark assigner. The problem then is figuring out when you can discard that state because it would otherwise grow indefinitely if you have an evolving key space.
You can simulate per-key watermarks by having a wrapping type in your pipeline that carries the watermarks. Something like ValueOrWatermark<K, V> and then in your operators you manually manage the watermark, per-key in your own state. You would run into the same problem of the evolving key-space, however.
Cheers,
Aljoscha
There's nothing stopping me assigning timestamps and generating watermarks on
a keyed stream in the implementation and the KeyedStream API supports it. It
appears the underlying operator that gets created in
DataStream.assignTimestampsAndWatermarks() isn't key-aware and globally
tracks timestamps. So is that what's technically preventing assigning
timestamps per key from working?
I'm curious to hear Aljoscha's thoughts on watermark management across keys.
Thanks!
--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Watermarks-per-key-tp11628p11761.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.