Generating processing time watermarks in idle event time kafka streams?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Generating processing time watermarks in idle event time kafka streams?

William Saar-2
Any standardized components to generate watermarks based on processing time in an event time stream when there is no data from a source?

The docs for event time [1] indicate that people are doing this, but the only suggestion on Stack Overflow [2] is to make every window operator in stream have a special processing time trigger, which seems clumsy and error prone if you have many jobs and many windows.

I'm thinking there might be a watermark generator that can be plugged into the kafka consumer to make it generate processing time watermarks when there are no messages? Or maybe a standardized operator that can be inserted into the beginning of each stream, just after the source, to generate such idle-source processing time watermarks in event time streams? 

1. "Note that sometimes when event time programs are processing live data in real-time, they will use some processing time operations in order to guarantee that they are progressing in a timely fashion."
2. https://stackoverflow.com/questions/46432368/flink-window-does-not-process-data-at-end-of-stream



Reply | Threaded
Open this post in threaded view
|

Re: Generating processing time watermarks in idle event time kafka streams?

Aljoscha Krettek
Hi William,

there is currently no official way of doing this but the Flink community will be working on this as part of an upcoming source refactoring.

For now, there is this watermark extractor that I once did: https://github.com/aljoscha/flink/commit/6e4419e550caa0e5b162bc0d2ccc43f6b0b3860f

Best,
Aljoscha

On 14. Dec 2018, at 09:52, William Saar <[hidden email]> wrote:

Any standardized components to generate watermarks based on processing time in an event time stream when there is no data from a source?

The docs for event time [1] indicate that people are doing this, but the only suggestion on Stack Overflow [2] is to make every window operator in stream have a special processing time trigger, which seems clumsy and error prone if you have many jobs and many windows.

I'm thinking there might be a watermark generator that can be plugged into the kafka consumer to make it generate processing time watermarks when there are no messages? Or maybe a standardized operator that can be inserted into the beginning of each stream, just after the source, to generate such idle-source processing time watermarks in event time streams? 

1. "Note that sometimes when event time programs are processing live data in real-time, they will use some processing time operations in order to guarantee that they are progressing in a timely fashion."




Reply | Threaded
Open this post in threaded view
|

Re: Generating processing time watermarks in idle event time kafka streams?

William Saar-2

Thanks, works great! This should be very useful for real-time dashboard that want to compute in event time, especially for multi-tenant systems or other specialized kafka topics that can have gaps in the traffic.
----- Original Message -----
From:
"Aljoscha Krettek" <[hidden email]>

To:
"William Saar" <[hidden email]>
Cc:
<[hidden email]>
Sent:
Fri, 14 Dec 2018 10:51:01 +0100
Subject:
Re: Generating processing time watermarks in idle event time kafka streams?


Hi William,

there is currently no official way of doing this but the Flink community will be working on this as part of an upcoming source refactoring.

For now, there is this watermark extractor that I once did: https://github.com/aljoscha/flink/commit/6e4419e550caa0e5b162bc0d2ccc43f6b0b3860f

Best,
Aljoscha

On 14. Dec 2018, at 09:52, William Saar <[hidden email]> wrote:

Any standardized components to generate watermarks based on processing time in an event time stream when there is no data from a source?

The docs for event time [1] indicate that people are doing this, but the only suggestion on Stack Overflow [2] is to make every window operator in stream have a special processing time trigger, which seems clumsy and error prone if you have many jobs and many windows.

I'm thinking there might be a watermark generator that can be plugged into the kafka consumer to make it generate processing time watermarks when there are no messages? Or maybe a standardized operator that can be inserted into the beginning of each stream, just after the source, to generate such idle-source processing time watermarks in event time streams? 

1. "Note that sometimes when event time programs are processing live data in real-time, they will use some processing time operations in order to guarantee that they are progressing in a timely fashion."