proper way to manage watermarks with messages combining multiple timestamps

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

proper way to manage watermarks with messages combining multiple timestamps

Mathieu D
Hello,

I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks.

We're processing messages from iot devices.
Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good.

These messages actually "pack" together several measures taken at different times, typically going from ~15mn back in the past from the message timestamp, to a few seconds back.

So at a point in the processing, I'll flatMap the message stream into a stream of measures, and I'll first need to reaffect the event time. I guess I can do it using a TimestampAssigner, correct ?

The flatmapped stream will now mix together a large range of event-times (so, a span of 15mn). What should I do regarding the watermark ? Should I regenerate one ? and how ?

My measures will go through windowed aggregations. Should I use the allowedLateness param to manage that properly ?
(Note: I'm ok with windows firing several times with updated content, if that matters. Our downstream usage is made for that.)

Thanks a lot for your insights and pointers :-)

Mathieu



Reply | Threaded
Open this post in threaded view
|

Re: proper way to manage watermarks with messages combining multiple timestamps

Mathieu D
Hi,

I can't change the way devices send their data. We are constrained in the messages sent per day per device.

To illustrate my question:
- at 9:08 a message is emitted. It packs together several measures:
- measure m1 taken at 8:52
- measure m2 taken at 9:07

m1 must go in the 8:00-9:00 aggregation
m2 in the 9:00-10:00 aggregation

What's the proper way to set the watermarks in such a case ?

Thanks for your insights !

Mathieu

Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard <[hidden email]> a écrit :
Hi

One thing to remember is that Flinks watermark is global this mean it’s shared between all keys (in your case ioT Devices) so the first requirement your have is to ensure the timestamp is aligned or almost aligned between yours IoT devices if not Flink’s watermark is hard to use.

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 16. apr. 2021 kl. 18.29 skrev Mathieu D <[hidden email]>:
>
> 
> Hello,
>
> I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks.
>
> We're processing messages from iot devices.
> Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good.
>
> These messages actually "pack" together several measures taken at different times, typically going from ~15mn back in the past from the message timestamp, to a few seconds back.
>
> So at a point in the processing, I'll flatMap the message stream into a stream of measures, and I'll first need to reaffect the event time. I guess I can do it using a TimestampAssigner, correct ?
>
> The flatmapped stream will now mix together a large range of event-times (so, a span of 15mn). What should I do regarding the watermark ? Should I regenerate one ? and how ?
>
> My measures will go through windowed aggregations. Should I use the allowedLateness param to manage that properly ?
> (Note: I'm ok with windows firing several times with updated content, if that matters. Our downstream usage is made for that.)
>
> Thanks a lot for your insights and pointers :-)
>
> Mathieu
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: proper way to manage watermarks with messages combining multiple timestamps

Arvid Heise-4
Hi Mathieu,

The easiest way is to already emit several inputs on the source level. If you use DeserializationSchema, try to use the method with the collector. The watermarks should then be generated as if you would only receive one element at a time.

On Sun, Apr 18, 2021 at 11:08 AM Mathieu D <[hidden email]> wrote:
Hi,

I can't change the way devices send their data. We are constrained in the messages sent per day per device.

To illustrate my question:
- at 9:08 a message is emitted. It packs together several measures:
- measure m1 taken at 8:52
- measure m2 taken at 9:07

m1 must go in the 8:00-9:00 aggregation
m2 in the 9:00-10:00 aggregation

What's the proper way to set the watermarks in such a case ?

Thanks for your insights !

Mathieu

Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard <[hidden email]> a écrit :
Hi

One thing to remember is that Flinks watermark is global this mean it’s shared between all keys (in your case ioT Devices) so the first requirement your have is to ensure the timestamp is aligned or almost aligned between yours IoT devices if not Flink’s watermark is hard to use.

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 16. apr. 2021 kl. 18.29 skrev Mathieu D <[hidden email]>:
>
> 
> Hello,
>
> I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks.
>
> We're processing messages from iot devices.
> Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good.
>
> These messages actually "pack" together several measures taken at different times, typically going from ~15mn back in the past from the message timestamp, to a few seconds back.
>
> So at a point in the processing, I'll flatMap the message stream into a stream of measures, and I'll first need to reaffect the event time. I guess I can do it using a TimestampAssigner, correct ?
>
> The flatmapped stream will now mix together a large range of event-times (so, a span of 15mn). What should I do regarding the watermark ? Should I regenerate one ? and how ?
>
> My measures will go through windowed aggregations. Should I use the allowedLateness param to manage that properly ?
> (Note: I'm ok with windows firing several times with updated content, if that matters. Our downstream usage is made for that.)
>
> Thanks a lot for your insights and pointers :-)
>
> Mathieu
>
>
>