Is the order guaranteed with Windowall

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Is the order guaranteed with Windowall

aldu29
Hello,

I use Watermarks and a function to sort events at the end of my pipeline.
I've used this tutorial to sort my data: https://training.da-platform.com/exercises/carSort.html
SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new SortEventFunction())..

Then I want to apply a Window and use AggregateFunction to obtain a group of data. Thus when a trigger is launched, I can push all these data to my backend at the same time (with puts method for Hbase for example)
But the order here must be guaranteed.
Can I use a windowAll on that stream ?
sortStream.windowAll(...

Thanks in advance
David


Reply | Threaded
Open this post in threaded view
|

Re: Is the order guaranteed with Windowall

Fabian Hueske-2
Hi,

A WindowAll is executed in a single task. If you sort the data before the window, the sorting must also happen in a single task, i.e., with parallelism 1.
The reasons is that an operator somewhat randomly merges multiple input partitions. So even if each input partition is sorted, the merging will result in out-of-order data.

Best,
Fabian

Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] <[hidden email]>:
Hello,

I use Watermarks and a function to sort events at the end of my pipeline.
I've used this tutorial to sort my data: https://training.da-platform.com/exercises/carSort.html
SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new SortEventFunction())..

Then I want to apply a Window and use AggregateFunction to obtain a group of data. Thus when a trigger is launched, I can push all these data to my backend at the same time (with puts method for Hbase for example)
But the order here must be guaranteed.
Can I use a windowAll on that stream ?
sortStream.windowAll(...

Thanks in advance
David


Reply | Threaded
Open this post in threaded view
|

Re: Is the order guaranteed with Windowall

aldu29
Hello Fabian,

Thanks !
According to your answers on this post https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key, if I'm right I can use my sort function followed by a keyby and use a Window for aggregate these events. And the order will be preserved if I use the same Key and the same partionning. I'm right ?

        SingleOutputStreamOperator<XXX> sortStream = conversionStreamKeyed.process(new SortEventFunction()).setParallelism(1).name("Sort events");

       // use the same key
        KeyedStream<XX, String> sortStreamKeyed = sortStream.keyBy((XXX event) -> event.getPartitionKey());

      sortStreamKeyed.window(TumblingProcessingTimeWindow....setParallelism(1).name("Aggregate events");

Thanks
David

On 2019/02/04 13:54:14, Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> A WindowAll is executed in a single task. If you sort the data before the
> window, the sorting must also happen in a single task, i.e., with
> parallelism 1.
> The reasons is that an operator somewhat randomly merges multiple input
> partitions. So even if each input partition is sorted, the merging will
> result in out-of-order data.
>
> Best,
> Fabian
>
> Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] <
> [hidden email]>:
>
> > Hello,
> >
> > I use Watermarks and a function to sort events at the end of my pipeline.
> > I've used this tutorial to sort my data:
> > https://training.da-platform.com/exercises/carSort.html
> > SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new
> > SortEventFunction())..
> >
> > Then I want to apply a Window and use AggregateFunction to obtain a group
> > of data. Thus when a trigger is launched, I can push all these data to my
> > backend at the same time (with puts method for Hbase for example)
> > But the order here must be guaranteed.
> > Can I use a windowAll on that stream ?
> > sortStream.windowAll(...
> >
> > Thanks in advance
> > David
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Is the order guaranteed with Windowall

Fabian Hueske-2
Yes, I think that should work.

Best, Fabian

Am Mo., 4. Feb. 2019 um 18:35 Uhr schrieb [hidden email] <[hidden email]>:
Hello Fabian,

Thanks !
According to your answers on this post https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key, if I'm right I can use my sort function followed by a keyby and use a Window for aggregate these events. And the order will be preserved if I use the same Key and the same partionning. I'm right ?

        SingleOutputStreamOperator<XXX> sortStream = conversionStreamKeyed.process(new SortEventFunction()).setParallelism(1).name("Sort events");

       // use the same key
        KeyedStream<XX, String> sortStreamKeyed = sortStream.keyBy((XXX event) -> event.getPartitionKey());

      sortStreamKeyed.window(TumblingProcessingTimeWindow....setParallelism(1).name("Aggregate events");

Thanks
David

On 2019/02/04 13:54:14, Fabian Hueske <[hidden email]> wrote:
> Hi,
>
> A WindowAll is executed in a single task. If you sort the data before the
> window, the sorting must also happen in a single task, i.e., with
> parallelism 1.
> The reasons is that an operator somewhat randomly merges multiple input
> partitions. So even if each input partition is sorted, the merging will
> result in out-of-order data.
>
> Best,
> Fabian
>
> Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] <
> [hidden email]>:
>
> > Hello,
> >
> > I use Watermarks and a function to sort events at the end of my pipeline.
> > I've used this tutorial to sort my data:
> > https://training.da-platform.com/exercises/carSort.html
> > SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new
> > SortEventFunction())..
> >
> > Then I want to apply a Window and use AggregateFunction to obtain a group
> > of data. Thus when a trigger is launched, I can push all these data to my
> > backend at the same time (with puts method for Hbase for example)
> > But the order here must be guaranteed.
> > Can I use a windowAll on that stream ?
> > sortStream.windowAll(...
> >
> > Thanks in advance
> > David
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Is the order guaranteed with Windowall

aldu29
ok great.
Thanks !

On 2019/02/04 18:00:16, Fabian Hueske <[hidden email]> wrote:

> Yes, I think that should work.
>
> Best, Fabian
>
> Am Mo., 4. Feb. 2019 um 18:35 Uhr schrieb [hidden email] <
> [hidden email]>:
>
> > Hello Fabian,
> >
> > Thanks !
> > According to your answers on this post
> > https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key,
> > if I'm right I can use my sort function followed by a keyby and use a
> > Window for aggregate these events. And the order will be preserved if I use
> > the same Key and the same partionning. I'm right ?
> >
> >         SingleOutputStreamOperator<XXX> sortStream =
> > conversionStreamKeyed.process(new
> > SortEventFunction()).setParallelism(1).name("Sort events");
> >
> >        // use the same key
> >         KeyedStream<XX, String> sortStreamKeyed = sortStream.keyBy((XXX
> > event) -> event.getPartitionKey());
> >
> >
> > sortStreamKeyed.window(TumblingProcessingTimeWindow....setParallelism(1).name("Aggregate
> > events");
> >
> > Thanks
> > David
> >
> > On 2019/02/04 13:54:14, Fabian Hueske <[hidden email]> wrote:
> > > Hi,
> > >
> > > A WindowAll is executed in a single task. If you sort the data before the
> > > window, the sorting must also happen in a single task, i.e., with
> > > parallelism 1.
> > > The reasons is that an operator somewhat randomly merges multiple input
> > > partitions. So even if each input partition is sorted, the merging will
> > > result in out-of-order data.
> > >
> > > Best,
> > > Fabian
> > >
> > > Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] <
> > > [hidden email]>:
> > >
> > > > Hello,
> > > >
> > > > I use Watermarks and a function to sort events at the end of my
> > pipeline.
> > > > I've used this tutorial to sort my data:
> > > > https://training.da-platform.com/exercises/carSort.html
> > > > SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new
> > > > SortEventFunction())..
> > > >
> > > > Then I want to apply a Window and use AggregateFunction to obtain a
> > group
> > > > of data. Thus when a trigger is launched, I can push all these data to
> > my
> > > > backend at the same time (with puts method for Hbase for example)
> > > > But the order here must be guaranteed.
> > > > Can I use a windowAll on that stream ?
> > > > sortStream.windowAll(...
> > > >
> > > > Thanks in advance
> > > > David
> > > >
> > > >
> > > >
> > >
> >
>