Collect event which arrive after watermark

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Collect event which arrive after watermark

shishal
Hi Flink community members,

I am new to flink stream processing. I am using event time processing and
keystream.

Sorry if my question sound silly but Is there a way to collect (or log) the
late event which arrived after watermark. So somehow I need to gather this
stats for further analysis.

Thanks,
Shishal





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Collect event which arrive after watermark

Fabian Hueske-2
Hi,

you can do that with a ProcessFunction [1].
The Context parameter of the ProcessFunction.processElement() method gives access to the current watermark and the timestamp of the current element.

In case you don't just want to log the late data but send it to a different DataStream (and sink it to a Kafka topic, file, or whatever) you can use side outputs [2].

Best, Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/side_output.html


2018-04-04 13:03 GMT+02:00 shishal <[hidden email]>:
Hi Flink community members,

I am new to flink stream processing. I am using event time processing and
keystream.

Sorry if my question sound silly but Is there a way to collect (or log) the
late event which arrived after watermark. So somehow I need to gather this
stats for further analysis.

Thanks,
Shishal





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Collect event which arrive after watermark

shishal
Thanks Fabian, My understand was that late event older than watermark is
dropped. So processFunction wont be called for late event. So I guess my
understanding was wrong. Or there is something more to it?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Collect event which arrive after watermark

Fabian Hueske-2
Window operators drop late events by default. When they receive a late event, they already computed and emitted a result.
Since there is not good default behavior to hay ndle a late event in this case, they are simply dropped.
However, Flink offers multiple ways to explicitly handle late events such as sending them to a side output or computing and emitting an updated result.

So, if you apply a ProcessFunction before any other operator, you can handle all late events.

Best, Fabian

2018-04-04 23:26 GMT+02:00 shishal <[hidden email]>:
Thanks Fabian, My understand was that late event older than watermark is
dropped. So processFunction wont be called for late event. So I guess my
understanding was wrong. Or there is something more to it?

Reply | Threaded
Open this post in threaded view
|

Re: Collect event which arrive after watermark

Xingcan Cui
In reply to this post by shishal
Hi Shishal,

you could manually separate the late events from the main stream in your process function with the "side outputs"[1].

Best,
Xingcan

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html

On 4 Apr 2018, at 7:03 PM, shishal <[hidden email]> wrote:

Hi Flink community members,

I am new to flink stream processing. I am using event time processing and
keystream.

Sorry if my question sound silly but Is there a way to collect (or log) the
late event which arrived after watermark. So somehow I need to gather this
stats for further analysis.

Thanks,
Shishal




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/