Hello,
I use Watermarks and a function to sort events at the end of my pipeline. I've used this tutorial to sort my data: https://training.da-platform.com/exercises/carSort.html SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new SortEventFunction()).. Then I want to apply a Window and use AggregateFunction to obtain a group of data. Thus when a trigger is launched, I can push all these data to my backend at the same time (with puts method for Hbase for example) But the order here must be guaranteed. Can I use a windowAll on that stream ? sortStream.windowAll(... Thanks in advance David |
Hi, A WindowAll is executed in a single task. If you sort the data before the window, the sorting must also happen in a single task, i.e., with parallelism 1. The reasons is that an operator somewhat randomly merges multiple input partitions. So even if each input partition is sorted, the merging will result in out-of-order data. Best, Fabian Hello, |
Hello Fabian,
Thanks ! According to your answers on this post https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key, if I'm right I can use my sort function followed by a keyby and use a Window for aggregate these events. And the order will be preserved if I use the same Key and the same partionning. I'm right ? SingleOutputStreamOperator<XXX> sortStream = conversionStreamKeyed.process(new SortEventFunction()).setParallelism(1).name("Sort events"); // use the same key KeyedStream<XX, String> sortStreamKeyed = sortStream.keyBy((XXX event) -> event.getPartitionKey()); sortStreamKeyed.window(TumblingProcessingTimeWindow....setParallelism(1).name("Aggregate events"); Thanks David On 2019/02/04 13:54:14, Fabian Hueske <[hidden email]> wrote: > Hi, > > A WindowAll is executed in a single task. If you sort the data before the > window, the sorting must also happen in a single task, i.e., with > parallelism 1. > The reasons is that an operator somewhat randomly merges multiple input > partitions. So even if each input partition is sorted, the merging will > result in out-of-order data. > > Best, > Fabian > > Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] < > [hidden email]>: > > > Hello, > > > > I use Watermarks and a function to sort events at the end of my pipeline. > > I've used this tutorial to sort my data: > > https://training.da-platform.com/exercises/carSort.html > > SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new > > SortEventFunction()).. > > > > Then I want to apply a Window and use AggregateFunction to obtain a group > > of data. Thus when a trigger is launched, I can push all these data to my > > backend at the same time (with puts method for Hbase for example) > > But the order here must be guaranteed. > > Can I use a windowAll on that stream ? > > sortStream.windowAll(... > > > > Thanks in advance > > David > > > > > > > |
Yes, I think that should work. Best, Fabian Hello Fabian, |
ok great.
Thanks ! On 2019/02/04 18:00:16, Fabian Hueske <[hidden email]> wrote: > Yes, I think that should work. > > Best, Fabian > > Am Mo., 4. Feb. 2019 um 18:35 Uhr schrieb [hidden email] < > [hidden email]>: > > > Hello Fabian, > > > > Thanks ! > > According to your answers on this post > > https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key, > > if I'm right I can use my sort function followed by a keyby and use a > > Window for aggregate these events. And the order will be preserved if I use > > the same Key and the same partionning. I'm right ? > > > > SingleOutputStreamOperator<XXX> sortStream = > > conversionStreamKeyed.process(new > > SortEventFunction()).setParallelism(1).name("Sort events"); > > > > // use the same key > > KeyedStream<XX, String> sortStreamKeyed = sortStream.keyBy((XXX > > event) -> event.getPartitionKey()); > > > > > > sortStreamKeyed.window(TumblingProcessingTimeWindow....setParallelism(1).name("Aggregate > > events"); > > > > Thanks > > David > > > > On 2019/02/04 13:54:14, Fabian Hueske <[hidden email]> wrote: > > > Hi, > > > > > > A WindowAll is executed in a single task. If you sort the data before the > > > window, the sorting must also happen in a single task, i.e., with > > > parallelism 1. > > > The reasons is that an operator somewhat randomly merges multiple input > > > partitions. So even if each input partition is sorted, the merging will > > > result in out-of-order data. > > > > > > Best, > > > Fabian > > > > > > Am Sa., 2. Feb. 2019 um 17:11 Uhr schrieb [hidden email] < > > > [hidden email]>: > > > > > > > Hello, > > > > > > > > I use Watermarks and a function to sort events at the end of my > > pipeline. > > > > I've used this tutorial to sort my data: > > > > https://training.da-platform.com/exercises/carSort.html > > > > SingleOutputStreamOperator<XXXX> sortStream = streamKeyed.process(new > > > > SortEventFunction()).. > > > > > > > > Then I want to apply a Window and use AggregateFunction to obtain a > > group > > > > of data. Thus when a trigger is launched, I can push all these data to > > my > > > > backend at the same time (with puts method for Hbase for example) > > > > But the order here must be guaranteed. > > > > Can I use a windowAll on that stream ? > > > > sortStream.windowAll(... > > > > > > > > Thanks in advance > > > > David > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |