Re ordering events with flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re ordering events with flink

Vishwas Siravara
Hi guys,
I want to know if it's possible to sort events in a flink data stream. I know I can't sort a stream but is there a way in which I can buffer for a very short time and sort those events before sending it to a data sink. 

In our scenario we consume from a kafka topic which has multiple partitions but the data in these brokers are not partitioned by a key(its round robin) , for example we want to time order transactions associated with a particular account but since the same account number ends up in different partitions at the source for different transactions we are not able to maintain event time order in our stream processing system since the same account number ends up in different task managers and slots. We do however partition by account number when we send the events to downstream kafka sink so that transactions from the same account number end up in the same partition. This is however not good enough since the events are not sorted at the source.  

Any ideas for doing this is much appreciated. 


Best,
Vishwas 
Reply | Threaded
Open this post in threaded view
|

Re: Re ordering events with flink

David Anderson-2
Yes, it's possible to sort a stream by the event time timestamps on
the events. This depends on having reliable watermarks -- as events
that are late can not be emitted in order.

There are several ways to accomplish this. You could, for example, use
a ProcessFunction, and implement the sorting yourself, or use a
Window, or take advantage of the sorting that's already been
implemented in either the SQL or CEP libraries.

For example of doing this with windows, see
https://stackoverflow.com/questions/58539379/guarantee-of-event-time-order-in-flinkkafkaconsumer.
For an implementation using SQL, see
https://stackoverflow.com/a/54970489/2000823. For an implementation
using the CEP library, including the use of side outputs for late
events, see https://github.com/ververica/flink-training-exercises/blob/master/src/main/java/com/ververica/flinktraining/examples/datastream_java/cep/Sort.java.

Best,
David

On Fri, Nov 1, 2019 at 10:21 PM Vishwas Siravara <[hidden email]> wrote:

>
> Hi guys,
> I want to know if it's possible to sort events in a flink data stream. I know I can't sort a stream but is there a way in which I can buffer for a very short time and sort those events before sending it to a data sink.
>
> In our scenario we consume from a kafka topic which has multiple partitions but the data in these brokers are not partitioned by a key(its round robin) , for example we want to time order transactions associated with a particular account but since the same account number ends up in different partitions at the source for different transactions we are not able to maintain event time order in our stream processing system since the same account number ends up in different task managers and slots. We do however partition by account number when we send the events to downstream kafka sink so that transactions from the same account number end up in the same partition. This is however not good enough since the events are not sorted at the source.
>
> Any ideas for doing this is much appreciated.
>
>
> Best,
> Vishwas