window limits ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

window limits ?

Bart van Deenen
Hi all

I'm doing a fold on a sliding window, using
TimeCharacteristic.EventTime. For output I'm picking the timestamp of
the most recent event in the window, and use that to name the output (to
a file).

My question is: will a second run of Flink on the same set of data (from
Kafka) put the same events in a Window, or are the limits of a window
somehow dependent on the real time of the run.
The windows I'm using are two sliding timeWindow's and one timeWindowAll

Thanks for any answers

Bart van Deenen
Reply | Threaded
Open this post in threaded view
|

Re: window limits ?

Matthias J. Sax-2
If you use event time, a second run will put the exact same tuples into
the windows (event time implies, that the timestamp is encoded in the
tuple itself, thus, it is independent of the wall-clock time).

However, be aware that the order of tuples *within a window* might change!

Thus, the timestamp of the "most recent event in the window" might change...


-Matthias

On 03/29/2016 09:35 AM, Bart van Deenen wrote:

> Hi all
>
> I'm doing a fold on a sliding window, using
> TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> the most recent event in the window, and use that to name the output (to
> a file).
>
> My question is: will a second run of Flink on the same set of data (from
> Kafka) put the same events in a Window, or are the limits of a window
> somehow dependent on the real time of the run.
> The windows I'm using are two sliding timeWindow's and one timeWindowAll
>
> Thanks for any answers
>
> Bart van Deenen
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: window limits ?

Bart van Deenen
Great!

I'm actually taking the max of the timestamps, so I should be fine.

Thanks

Bart

On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:

> If you use event time, a second run will put the exact same tuples into
> the windows (event time implies, that the timestamp is encoded in the
> tuple itself, thus, it is independent of the wall-clock time).
>
> However, be aware that the order of tuples *within a window* might
> change!
>
> Thus, the timestamp of the "most recent event in the window" might
> change...
>
>
> -Matthias
>
> On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > Hi all
> >
> > I'm doing a fold on a sliding window, using
> > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > the most recent event in the window, and use that to name the output (to
> > a file).
> >
> > My question is: will a second run of Flink on the same set of data (from
> > Kafka) put the same events in a Window, or are the limits of a window
> > somehow dependent on the real time of the run.
> > The windows I'm using are two sliding timeWindow's and one timeWindowAll
> >
> > Thanks for any answers
> >
> > Bart van Deenen
> >
>
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)
Reply | Threaded
Open this post in threaded view
|

Re: window limits ?

Aljoscha Krettek
Hi,
which version of Flink are you using and do you have a custom timestamp extractor/watermark extractor? The semantics of this changed between 0.10 and 1.0 and I just want to make sure that you get the correct behavior.

Cheers,
Aljoscha

On Tue, 29 Mar 2016 at 10:13 Bart van Deenen <[hidden email]> wrote:
Great!

I'm actually taking the max of the timestamps, so I should be fine.

Thanks

Bart

On Tue, Mar 29, 2016, at 09:48, Matthias J. Sax wrote:
> If you use event time, a second run will put the exact same tuples into
> the windows (event time implies, that the timestamp is encoded in the
> tuple itself, thus, it is independent of the wall-clock time).
>
> However, be aware that the order of tuples *within a window* might
> change!
>
> Thus, the timestamp of the "most recent event in the window" might
> change...
>
>
> -Matthias
>
> On 03/29/2016 09:35 AM, Bart van Deenen wrote:
> > Hi all
> >
> > I'm doing a fold on a sliding window, using
> > TimeCharacteristic.EventTime. For output I'm picking the timestamp of
> > the most recent event in the window, and use that to name the output (to
> > a file).
> >
> > My question is: will a second run of Flink on the same set of data (from
> > Kafka) put the same events in a Window, or are the limits of a window
> > somehow dependent on the real time of the run.
> > The windows I'm using are two sliding timeWindow's and one timeWindowAll
> >
> > Thanks for any answers
> >
> > Bart van Deenen
> >
>
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)