My flink application generates output (complex) events based on the
processing of (simple) input events. The generated output events are to be consumed by other external services. My application works using event-time semantics, so I am bit in doubt regarding what should I use as the output events' timestamp. Should I use: - the processing time at the moment of generating them? - the event time (given by the watermark value)? - both? For my use case, I am using both for now. But maybe you can come up with examples/justifications for each of the given options. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Salva
I think this depends on what the relationship between you output and input events. If the output ones are just simple wrapper of input ones, e.g. adding some simple properties or just read from one place and write to another place, I think the output events
could hold time which is inherited from input ones. That is to say, event-time semantics might be more proper.
On the other hand, if the output events have more independent relationship with input ones, and those tasks in Flink TM could be treated as the event generator, I think you can make the time as the processing time when generating them.
I think there is no absolute rules and all depends on your actual scenarios.
Best
Yun Tang
From: Salva Alcántara <[hidden email]>
Sent: Monday, April 20, 2020 2:03 To: [hidden email] <[hidden email]> Subject: Modelling time for complex events generated out of simple ones My flink application generates output (complex) events based on the
processing of (simple) input events. The generated output events are to be consumed by other external services. My application works using event-time semantics, so I am bit in doubt regarding what should I use as the output events' timestamp. Should I use: - the processing time at the moment of generating them? - the event time (given by the watermark value)? - both? For my use case, I am using both for now. But maybe you can come up with examples/justifications for each of the given options. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In my case, the relationship between input and output events is that output
events are generated out of some rules based on input events. Essentially, output events correspond to specific patterns / sequences of input events. You can think of output events as detecting certain anomalies or abnormal conditions. So I guess we are more in the second case you mention where the Flink TM can be regarded as a generator and hence using the processing time makes sense. Indeed, I am using both the processing time and the event time watermark value at the moment of generating the output events. I think both convey useful information. In particular, the processing time looks as the logical timestamp for the output events. However, although that would be an exception, it might also happen that my flink app is processing old data at some point. That is why I am also adding another timestamp with the current event-time watermark value. This allows the consumer of the output events to detect whether the output event corresponds to old data or not (by comparing the difference between the processing time and event time timestamps, which should in normal conditions be close to each other, except when processing old data). In the case of using both, what naming would you use for the two fields? Something along the lines of event_time and processing_time seems to leak implementation details of my app to the external services... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
We had a larger discussion on stackoverflow [1], so I'm adding a cross link if any other user is coming here first. On Mon, Apr 20, 2020 at 6:52 AM Salva Alcántara <[hidden email]> wrote: In my case, the relationship between input and output events is that output -- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Free forum by Nabble | Edit this page |