(DEPRECATED) Apache Flink User Mailing List archive.

Window based on tuple timestamps

Classic

List

Threaded

4 messages Options

Philipp Goetze

Window based on tuple timestamps

Hello again,

another question from me =). Could you provide an example on how to correctly use windows based on timestamps on the tuples (i.e. non-realtime)?

As a simple example I tried something like this:

val w4 = rdfSource.window(Time.of(200000L, customTimestamp, 1219625100000L)).mapWindow(customMap _).flatten().print()

In the customMap I just collect all tuples in one list for each window. However, this does not work correctly as sometimes all the records are collected in one list, although there should be two windows for the example data.

Best Regards,
Philipp

Aljoscha Krettek

Re: Window based on tuple timestamps

Hi Philipp,

am I correct to assume that your tuples do not arrive in the order of the timestamp that you extract. Unfortunately, for that case the current windowing implementation does not work correctly. We are working hard on fixing this for the upcoming 0.10 release, though. If you are interested in what we are doing there you can check out the design documents in the Flink wiki:

https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams

https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams

Cheers,

Aljoscha

On Fri, 18 Sep 2015 at 13:19 Philipp Goetze <[hidden email]> wrote:

Hello again,

another question from me =). Could you provide an example on how to correctly use windows based on timestamps on the tuples (i.e. non-realtime)?

As a simple example I tried something like this:

val w4 = rdfSource.window(Time.of(200000L, customTimestamp, 1219625100000L)).mapWindow(customMap _).flatten().print()

In the customMap I just collect all tuples in one list for each window. However, this does not work correctly as sometimes all the records are collected in one list, although there should be two windows for the example data.

Best Regards,
Philipp

Philipp Goetze

Re: Window based on tuple timestamps

Hi Aljoscha,

on the big data sets I am not quite sure if they are completely ordered, but my test data (just 10 records) is sorted. There are only two different timestamps. Sometimes I end up with two windows (which is correct) and sometimes only one. So this is not deterministic. Should I maybe set the parallelism to 1 at some place?

Best Regards,
Philipp

On 18.09.2015 17:05, Aljoscha Krettek wrote:

Hi Philipp,
am I correct to assume that your tuples do not arrive in the order of the timestamp that you extract. Unfortunately, for that case the current windowing implementation does not work correctly. We are working hard on fixing this for the upcoming 0.10 release, though. If you are interested in what we are doing there you can check out the design documents in the Flink wiki:

https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams

https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams

Cheers,

Aljoscha

On Fri, 18 Sep 2015 at 13:19 Philipp Goetze <[hidden email]> wrote:

Hello again,

another question from me =). Could you provide an example on how to correctly use windows based on timestamps on the tuples (i.e. non-realtime)?

As a simple example I tried something like this:

val w4 = rdfSource.window(Time.of(200000L, customTimestamp, 1219625100000L)).mapWindow(customMap _).flatten().print()

In the customMap I just collect all tuples in one list for each window. However, this does not work correctly as sometimes all the records are collected in one list, although there should be two windows for the example data.

Best Regards,
Philipp

Aljoscha Krettek

Re: Window based on tuple timestamps

Hi,

yes, setting the parallelism to 1 might solve it but it will also cripple the performance. During next week the first part of our rework of the windowing system should make it into master, so then you would be able to test your programs with that. If you are using the 0.10-SNAPSHOT version.

I'm sorry we don't have a better answer right now but we are working hard on fixing these things.

Cheers,

Aljoscha

On Sat, 19 Sep 2015 at 12:53 Philipp Goetze <[hidden email]> wrote:

Hi Aljoscha,

on the big data sets I am not quite sure if they are completely ordered, but my test data (just 10 records) is sorted. There are only two different timestamps. Sometimes I end up with two windows (which is correct) and sometimes only one. So this is not deterministic. Should I maybe set the parallelism to 1 at some place?

Best Regards,
Philipp

On 18.09.2015 17:05, Aljoscha Krettek wrote:

Hi Philipp,
am I correct to assume that your tuples do not arrive in the order of the timestamp that you extract. Unfortunately, for that case the current windowing implementation does not work correctly. We are working hard on fixing this for the upcoming 0.10 release, though. If you are interested in what we are doing there you can check out the design documents in the Flink wiki:

https://cwiki.apache.org/confluence/display/FLINK/Streams+and+Operations+on+Streams

https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams

Cheers,

Aljoscha

On Fri, 18 Sep 2015 at 13:19 Philipp Goetze <[hidden email]> wrote:

Hello again,

another question from me =). Could you provide an example on how to correctly use windows based on timestamps on the tuples (i.e. non-realtime)?

As a simple example I tried something like this:

val w4 = rdfSource.window(Time.of(200000L, customTimestamp, 1219625100000L)).mapWindow(customMap _).flatten().print()

In the customMap I just collect all tuples in one list for each window. However, this does not work correctly as sometimes all the records are collected in one list, although there should be two windows for the example data.

Best Regards,
Philipp