Changing watermark in the middle of a flow

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Changing watermark in the middle of a flow

Lorenzo Nicora
Hi

I have a linear streaming flow with a single source and multiple sinks to publish intermediate results.
The time characteristic is Event Time and I am adding one AssignerWithPeriodicWatermarks immediately after the source.
I need to add a different assigner, in the middle of the flow, to change the event time - i.e. extracting a different field from the record as event time.

I am not sure I completely understand the implications of changing event time and watermark in the middle of a flow.

Can anybody give me a hint or direct me to any relevant documentation?

Lorenzo
Reply | Threaded
Open this post in threaded view
|

Re: Changing watermark in the middle of a flow

Kostas Kloudas-2
Hi Lorenzo,

If you want to learn how Flink uses watermarks, it is worth checking [1].

Now in a nutshell, what a watermark will do in a pipeline is that it
may fire timers that you may have registered, or windows that you may
have accumulated.
If you have no time-sensitive operations between the first and the
second watermark generators, then I do not think you have to worry
(although it would help if you could share a bit more about your
pipeline in order to have a more educated estimation). If you have
windows, then your windows will fire and the emitted elements will
have the timestamp of the end of the window.

After the second watermark assigner, the watermarks coming from the
previous one are discarded and they are not flowing in the pipeline
anymore. You will only have the new watermarks.

A problem may arise if, for example, the second watermark generator
emits watermarks with smaller values than the first but the timestamps
of the elements are assigned based on the progress of the first
generator (e.g. windows fired) and now all your elements are
considered "late".

I hope that the above paint the big picture of what is happening in
your pipeline.

Again, I may be missing something so feel free to send more details
about your pipeline so that we can help a bit more.

Cheers,
Kostas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/timely-stream-processing.html

On Wed, Jul 22, 2020 at 9:35 AM Lorenzo Nicora <[hidden email]> wrote:

>
> Hi
>
> I have a linear streaming flow with a single source and multiple sinks to publish intermediate results.
> The time characteristic is Event Time and I am adding one AssignerWithPeriodicWatermarks immediately after the source.
> I need to add a different assigner, in the middle of the flow, to change the event time - i.e. extracting a different field from the record as event time.
>
> I am not sure I completely understand the implications of changing event time and watermark in the middle of a flow.
>
> Can anybody give me a hint or direct me to any relevant documentation?
>
> Lorenzo