coordinate watermarks between jobs?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

coordinate watermarks between jobs?

xiatao123
Hi All,
  I am trying to reply events from 3 different sources and hopefully in time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot, the watermarks on one stream is much faster than other streams.  Is there any way to coordinate the watermarks between different input streams. 
Thanks,
Tao
Reply | Threaded
Open this post in threaded view
|

Re: coordinate watermarks between jobs?

Fabian Hueske-2
Hi Tao,

The watermarks of operators that consume from two (or more) streams are always synced to the lowest watermark.
This behavior guarantees that data won't be late (unless it was late when watermarks were assigned). However, the operator will most likely need to buffer more events from the "faster" streams.

Right now, it is not possible to throttle faster streams to the pace of the slowest stream.

Best, Fabian

2018-04-27 1:05 GMT+02:00 Tao Xia <[hidden email]>:
Hi All,
  I am trying to reply events from 3 different sources and hopefully in time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot, the watermarks on one stream is much faster than other streams.  Is there any way to coordinate the watermarks between different input streams. 
Thanks,
Tao

Reply | Threaded
Open this post in threaded view
|

Re: coordinate watermarks between jobs?

xiatao123
Without throttle, it will eventually ran out of memory.
I think this is a very common use case for Flink users during stream replay or re-process.  
Do we have anything feature planed for it? Would like to contribute on the initiative.

On Wed, May 2, 2018 at 2:43 AM, Fabian Hueske <[hidden email]> wrote:
Hi Tao,

The watermarks of operators that consume from two (or more) streams are always synced to the lowest watermark.
This behavior guarantees that data won't be late (unless it was late when watermarks were assigned). However, the operator will most likely need to buffer more events from the "faster" streams.

Right now, it is not possible to throttle faster streams to the pace of the slowest stream.

Best, Fabian

2018-04-27 1:05 GMT+02:00 Tao Xia <[hidden email]>:
Hi All,
  I am trying to reply events from 3 different sources and hopefully in time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot, the watermarks on one stream is much faster than other streams.  Is there any way to coordinate the watermarks between different input streams. 
Thanks,
Tao


Reply | Threaded
Open this post in threaded view
|

Re: coordinate watermarks between jobs?

Eron Wright
It might be possible to apply backpressure to the channels that are significantly ahead in event time.  Tao, it would not be trivial, but if you'd like to investigate more deeply, take a look at the Flink runtime's `StatusWatermarkValve` and the associated stream input processors to see how an operator integrates incoming watermarks.   A key challenge would be to apply backpressure to the upstream channel for reasons other than the availability of network buffers.  Take a look at FLINK-7282 which introduced a credit system that may be useful here.

On Fri, May 4, 2018 at 10:07 AM, Tao Xia <[hidden email]> wrote:
Without throttle, it will eventually ran out of memory.
I think this is a very common use case for Flink users during stream replay or re-process.  
Do we have anything feature planed for it? Would like to contribute on the initiative.

On Wed, May 2, 2018 at 2:43 AM, Fabian Hueske <[hidden email]> wrote:
Hi Tao,

The watermarks of operators that consume from two (or more) streams are always synced to the lowest watermark.
This behavior guarantees that data won't be late (unless it was late when watermarks were assigned). However, the operator will most likely need to buffer more events from the "faster" streams.

Right now, it is not possible to throttle faster streams to the pace of the slowest stream.

Best, Fabian

2018-04-27 1:05 GMT+02:00 Tao Xia <[hidden email]>:
Hi All,
  I am trying to reply events from 3 different sources and hopefully in time sequence, say Stream1, Stream2, Stream3. Since their size vary a lot, the watermarks on one stream is much faster than other streams.  Is there any way to coordinate the watermarks between different input streams. 
Thanks,
Tao