Generate watermarks per key in a KeyedStream

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Generate watermarks per key in a KeyedStream

Shailesh Jain
Hi,

I'm working on implementing a use case wherein different physical devices are sending events, and due to network/power issues, there can be a delay in receiving events at Flink source. One of the operators within the flink job is the Pattern operator, and there are certain patterns which are time sensitive, so I'm using Event time characteristic. But the problem comes when there are unpredictable delays in events from a particular device(s), which causes those events to be dropped (as I cannot really define a static bound to allow for lateness).

Since I'm using a KeyedStream, keyed on the source device ID, is there a way to allow each CEP operator instance (one per key) to progress its time based on the event time in the corresponding stream partition. Or in other words, is there a way to generate watermarks per partition in a KeyedStream?

Thanks,
Shailesh
Reply | Threaded
Open this post in threaded view
|

Re: Generate watermarks per key in a KeyedStream

Xingcan Cui
Hi Shailesh,

actually, the watermarks are generated per partition, but all of them will be forcibly aligned to the minimum one during processing. That is decided by the semantics of watermark and KeyedStream, i.e., the watermarks belong to a whole stream and a stream is made up of different partitions (one per key).

If the physical devices work in different time systems due to delay, the event streams from them should be treated separately.

Hope that helps.

Best,
Xingcan

On Wed, Nov 8, 2017 at 11:48 PM, Shailesh Jain <[hidden email]> wrote:
Hi,

I'm working on implementing a use case wherein different physical devices are sending events, and due to network/power issues, there can be a delay in receiving events at Flink source. One of the operators within the flink job is the Pattern operator, and there are certain patterns which are time sensitive, so I'm using Event time characteristic. But the problem comes when there are unpredictable delays in events from a particular device(s), which causes those events to be dropped (as I cannot really define a static bound to allow for lateness).

Since I'm using a KeyedStream, keyed on the source device ID, is there a way to allow each CEP operator instance (one per key) to progress its time based on the event time in the corresponding stream partition. Or in other words, is there a way to generate watermarks per partition in a KeyedStream?

Thanks,
Shailesh

Reply | Threaded
Open this post in threaded view
|

Re: Generate watermarks per key in a KeyedStream

Shailesh Jain
Thanks for your reply, Xingcan.

On Wed, Nov 8, 2017 at 10:42 PM, Xingcan Cui <[hidden email]> wrote:
Hi Shailesh,

actually, the watermarks are generated per partition, but all of them will be forcibly aligned to the minimum one during processing. That is decided by the semantics of watermark and KeyedStream, i.e., the watermarks belong to a whole stream and a stream is made up of different partitions (one per key).

If the physical devices work in different time systems due to delay, the event streams from them should be treated separately.

Hope that helps.

Best,
Xingcan

On Wed, Nov 8, 2017 at 11:48 PM, Shailesh Jain <[hidden email]> wrote:
Hi,

I'm working on implementing a use case wherein different physical devices are sending events, and due to network/power issues, there can be a delay in receiving events at Flink source. One of the operators within the flink job is the Pattern operator, and there are certain patterns which are time sensitive, so I'm using Event time characteristic. But the problem comes when there are unpredictable delays in events from a particular device(s), which causes those events to be dropped (as I cannot really define a static bound to allow for lateness).

Since I'm using a KeyedStream, keyed on the source device ID, is there a way to allow each CEP operator instance (one per key) to progress its time based on the event time in the corresponding stream partition. Or in other words, is there a way to generate watermarks per partition in a KeyedStream?

Thanks,
Shailesh


Reply | Threaded
Open this post in threaded view
|

Re: Generate watermarks per key in a KeyedStream

Derek VerLee

We are contending with the same issue, as it happens.  We have dozens, and potentially down the line, may need to deal with thousands of different "time systems" as you put it, and may not be know at compile time or job start time.  In a practical sense, how could such a system be composed? 


On 11/9/17 5:52 AM, Shailesh Jain wrote:
Thanks for your reply, Xingcan.

On Wed, Nov 8, 2017 at 10:42 PM, Xingcan Cui <[hidden email]> wrote:
Hi Shailesh,

actually, the watermarks are generated per partition, but all of them will be forcibly aligned to the minimum one during processing. That is decided by the semantics of watermark and KeyedStream, i.e., the watermarks belong to a whole stream and a stream is made up of different partitions (one per key).

If the physical devices work in different time systems due to delay, the event streams from them should be treated separately.

Hope that helps.

Best,
Xingcan

On Wed, Nov 8, 2017 at 11:48 PM, Shailesh Jain <[hidden email]> wrote:
Hi,

I'm working on implementing a use case wherein different physical devices are sending events, and due to network/power issues, there can be a delay in receiving events at Flink source. One of the operators within the flink job is the Pattern operator, and there are certain patterns which are time sensitive, so I'm using Event time characteristic. But the problem comes when there are unpredictable delays in events from a particular device(s), which causes those events to be dropped (as I cannot really define a static bound to allow for lateness).

Since I'm using a KeyedStream, keyed on the source device ID, is there a way to allow each CEP operator instance (one per key) to progress its time based on the event time in the corresponding stream partition. Or in other words, is there a way to generate watermarks per partition in a KeyedStream?

Thanks,
Shailesh