Hi all
I'm doing a fold on a sliding window, using TimeCharacteristic.EventTime. For output I'm picking the timestamp of the most recent event in the window, and use that to name the output (to a file). My question is: will a second run of Flink on the same set of data (from Kafka) put the same events in a Window, or are the limits of a window somehow dependent on the real time of the run. The windows I'm using are two sliding timeWindow's and one timeWindowAll Thanks for any answers Bart van Deenen |
If you use event time, a second run will put the exact same tuples into
the windows (event time implies, that the timestamp is encoded in the tuple itself, thus, it is independent of the wall-clock time). However, be aware that the order of tuples *within a window* might change! Thus, the timestamp of the "most recent event in the window" might change... -Matthias On 03/29/2016 09:35 AM, Bart van Deenen wrote: > Hi all > > I'm doing a fold on a sliding window, using > TimeCharacteristic.EventTime. For output I'm picking the timestamp of > the most recent event in the window, and use that to name the output (to > a file). > > My question is: will a second run of Flink on the same set of data (from > Kafka) put the same events in a Window, or are the limits of a window > somehow dependent on the real time of the run. > The windows I'm using are two sliding timeWindow's and one timeWindowAll > > Thanks for any answers > > Bart van Deenen > signature.asc (836 bytes) Download Attachment |
Great!
I'm actually taking the max of the timestamps, so I should be fine. Thanks Bart On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote: > If you use event time, a second run will put the exact same tuples into > the windows (event time implies, that the timestamp is encoded in the > tuple itself, thus, it is independent of the wall-clock time). > > However, be aware that the order of tuples *within a window* might > change! > > Thus, the timestamp of the "most recent event in the window" might > change... > > > -Matthias > > On 03/29/2016 09:35 AM, Bart van Deenen wrote: > > Hi all > > > > I'm doing a fold on a sliding window, using > > TimeCharacteristic.EventTime. For output I'm picking the timestamp of > > the most recent event in the window, and use that to name the output (to > > a file). > > > > My question is: will a second run of Flink on the same set of data (from > > Kafka) put the same events in a Window, or are the limits of a window > > somehow dependent on the real time of the run. > > The windows I'm using are two sliding timeWindow's and one timeWindowAll > > > > Thanks for any answers > > > > Bart van Deenen > > > > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) |
Hi,
which version of Flink are you using and do you have a custom timestamp extractor/watermark extractor? The semantics of this changed between 0.10 and 1.0 and I just want to make sure that you get the correct behavior. Cheers, Aljoscha On Tue, 29 Mar 2016 at 10:13 Bart van Deenen <[hidden email]> wrote: Great! |
Free forum by Nabble | Edit this page |