(DEPRECATED) Apache Flink User Mailing List archive.

Modelling time for complex events generated out of simple ones

Classic

List

Threaded

4 messages Options

Salva Alcántara

Modelling time for complex events generated out of simple ones

My flink application generates output (complex) events based on the
processing of (simple) input events. The generated output events are to be
consumed by other external services. My application works using event-time
semantics, so I am bit in doubt regarding what should I use as the output
events' timestamp.

Should I use:

- the processing time at the moment of generating them?
- the event time (given by the watermark value)?
- both?

For my use case, I am using both for now. But maybe you can come up with
examples/justifications for each of the given options.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yun Tang

Re: Modelling time for complex events generated out of simple ones

Hi Salva

I think this depends on what the relationship between you output and input events. If the output ones are just simple wrapper of input ones, e.g. adding some simple properties or just read from one place and write to another place, I think the output events could hold time which is inherited from input ones. That is to say, event-time semantics might be more proper.

On the other hand, if the output events have more independent relationship with input ones, and those tasks in Flink TM could be treated as the event generator, I think you can make the time as the processing time when generating them.

I think there is no absolute rules and all depends on your actual scenarios.

Best

Yun Tang

From: Salva Alcántara <[hidden email]>
Sent: Monday, April 20, 2020 2:03
To: [hidden email] <[hidden email]>
Subject: Modelling time for complex events generated out of simple ones

Salva Alcántara

Re: Modelling time for complex events generated out of simple ones

In my case, the relationship between input and output events is that output
events are generated out of some rules based on input events. Essentially,
output events correspond to specific patterns / sequences of input events.
You can think of output events as detecting certain anomalies or abnormal
conditions. So I guess we are more in the second case you mention where the
Flink TM can be regarded as a generator and hence using the processing time
makes sense.

Indeed, I am using both the processing time and the event time watermark
value at the moment of generating the output events. I think both convey
useful information. In particular, the processing time looks as the logical
timestamp for the output events. However, although that would be an
exception, it might also happen that my flink app is processing old data at
some point. That is why I am also adding another timestamp with the current
event-time watermark value. This allows the consumer of the output events to
detect whether the output event corresponds to old data or not (by comparing
the difference between the processing time and event time timestamps, which
should in normal conditions be close to each other, except when processing
old data).

In the case of using both, what naming would you use for the two fields?
Something along the lines of event_time and processing_time seems to leak
implementation details of my app to the external services...

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Arvid Heise-3

Re: Modelling time for complex events generated out of simple ones

We had a larger discussion on stackoverflow [1], so I'm adding a cross link if any other user is coming here first.

[1] https://stackoverflow.com/questions/61309174/modelling-time-for-complex-events-generated-out-of-simple-ones/

On Mon, Apr 20, 2020 at 6:52 AM Salva Alcántara <[hidden email]> wrote:

In my case, the relationship between input and output events is that output
events are generated out of some rules based on input events. Essentially,
output events correspond to specific patterns / sequences of input events.
You can think of output events as detecting certain anomalies or abnormal
conditions. So I guess we are more in the second case you mention where the
Flink TM can be regarded as a generator and hence using the processing time
makes sense.

Indeed, I am using both the processing time and the event time watermark
value at the moment of generating the output events. I think both convey
useful information. In particular, the processing time looks as the logical
timestamp for the output events. However, although that would be an
exception, it might also happen that my flink app is processing old data at
some point. That is why I am also adding another timestamp with the current
event-time watermark value. This allows the consumer of the output events to
detect whether the output event corresponds to old data or not (by comparing
the difference between the processing time and event time timestamps, which
should in normal conditions be close to each other, except when processing
old data).

In the case of using both, what naming would you use for the two fields?
Something along the lines of event_time and processing_time seems to leak
implementation details of my app to the external services...

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Arvid Heise | Senior Java Developer

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng