Hello, I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks. We're processing messages from iot devices. Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good. These messages actually "pack" together several measures taken at different times, typically going from ~15mn back in the past from the message timestamp, to a few seconds back. So at a point in the processing, I'll flatMap the message stream into a stream of measures, and I'll first need to reaffect the event time. I guess I can do it using a TimestampAssigner, correct ? The flatmapped stream will now mix together a large range of event-times (so, a span of 15mn). What should I do regarding the watermark ? Should I regenerate one ? and how ? My measures will go through windowed aggregations. Should I use the allowedLateness param to manage that properly ? (Note: I'm ok with windows firing several times with updated content, if that matters. Our downstream usage is made for that.) Thanks a lot for your insights and pointers :-) Mathieu |
Hi, I can't change the way devices send their data. We are constrained in the messages sent per day per device. To illustrate my question: - at 9:08 a message is emitted. It packs together several measures: - measure m1 taken at 8:52 - measure m2 taken at 9:07 m1 must go in the 8:00-9:00 aggregation m2 in the 9:00-10:00 aggregation What's the proper way to set the watermarks in such a case ? Thanks for your insights ! Mathieu Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard <[hidden email]> a écrit : Hi |
Hi Mathieu, The easiest way is to already emit several inputs on the source level. If you use DeserializationSchema, try to use the method with the collector. The watermarks should then be generated as if you would only receive one element at a time. On Sun, Apr 18, 2021 at 11:08 AM Mathieu D <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |