Order events by filed that does not represent time

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Order events by filed that does not represent time

KristoffSC
Hi,
Is it possible to use an field that does not represent timestamp to order
events in Flink's pipeline?

In other words, I will receive a stream of events that will ha a sequence
number (gaps are possible).
Can I maintain the order of those events based on this field same as I would
do for time representing field?

Regards,
Krzysztof



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Order events by filed that does not represent time

Timo Walther
Hi Krzysztof,

first of all Flink does not sort events based on timestamp. The concept
of watermarks just postpones the triggering of a time operation until
the watermark says all events until a time t have arrived.

For your problem, you can simply use a ProcessFunction and buffer the
events in state until some condition is met. Once the condition is met,
you sort the data and emit what is allowed to be emitted.

You can also take a look at how Flink SQL's event time sort is
implemented. Maybe not the easiest implementation but useful for
understanding the concepts of time and state.

org.apache.flink.table.runtime.aggregate.RowTimeSortProcessFunction

I hope this helps.

Timo



On 10.12.19 17:15, KristoffSC wrote:

> Hi,
> Is it possible to use an field that does not represent timestamp to order
> events in Flink's pipeline?
>
> In other words, I will receive a stream of events that will ha a sequence
> number (gaps are possible).
> Can I maintain the order of those events based on this field same as I would
> do for time representing field?
>
> Regards,
> Krzysztof
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Order events by filed that does not represent time

David Anderson-2
In reply to this post by KristoffSC
Krzysztof,

Note that if you want to have Flink treat these sequence numbers as event time timestamps, you probably can, so long as they are generally increasing, and there's some bound on how out-of-order they can be. 

The advantage to doing this is that you might be able to use Flink SQL's event time sorting directly, rather than implementing something yourself. To get this to work you will need to be able to specify watermarking -- which should be feasible, so long as there's some bound on the out-of-orderness of the sequence numbers. 

David

On Tue, Dec 10, 2019 at 5:09 PM KristoffSC <[hidden email]> wrote:
Hi,
Is it possible to use an field that does not represent timestamp to order
events in Flink's pipeline?

In other words, I will receive a stream of events that will ha a sequence
number (gaps are possible).
Can I maintain the order of those events based on this field same as I would
do for time representing field?

Regards,
Krzysztof



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/