event ordering

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

event ordering

Christophe Jolif
Hi everyone,

Let's imagine I have a stream of events coming a bit like this:

{ id: "1", value: 1, timestamp: 1 }
{ id: "2", value: 2, timestamp: 1 }
{ id: "1", value: 4, timestamp: 3 }
{ id: "1", value: 5, timestamp: 2 }
{ id: "2", value: 5, timestamp: 3 }
...

As you can seeĀ  with the non monotonically increasing timestamps, for various reasons, events can be slightly "un-ordered"

Now I want to use Flink to process this stream, to compute by id (my key) the latest value and update it in a DB. But obviously that latest value must reflect the original time stamp and not the processing time stamp.

I've seen that Flink can deal with event-time processing, in the sense that if I need to do a windowed operation I can ensure an event will be assign to the "correct" window.

But here the use-case seems slightly different. How would you proceed to do that in Flink?

Thanks!
--
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: event ordering

Fabian Hueske-2
Hi Christophe,

Flink exposes event-time and watermark not only in windows.
The easiest solution would be to use a ProcessFunction [1] which can access the timestamp of a record.

I would apply a ProcessFunction on a keyed stream (keyBy(id) [2]) and store the max timestamp per key in a ValueState [3].
When you receive a new record, you check if its timestamp is larger than the timestamp in the state. If that is the case, you update the state and forward the record. Otherwise, you drop the record.

Hope that helps,
Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/process_function.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#datastream-transformations
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-keyed-state

2018-01-09 12:23 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi everyone,

Let's imagine I have a stream of events coming a bit like this:

{ id: "1", value: 1, timestamp: 1 }
{ id: "2", value: 2, timestamp: 1 }
{ id: "1", value: 4, timestamp: 3 }
{ id: "1", value: 5, timestamp: 2 }
{ id: "2", value: 5, timestamp: 3 }
...

As you can seeĀ  with the non monotonically increasing timestamps, for various reasons, events can be slightly "un-ordered"

Now I want to use Flink to process this stream, to compute by id (my key) the latest value and update it in a DB. But obviously that latest value must reflect the original time stamp and not the processing time stamp.

I've seen that Flink can deal with event-time processing, in the sense that if I need to do a windowed operation I can ensure an event will be assign to the "correct" window.

But here the use-case seems slightly different. How would you proceed to do that in Flink?

Thanks!
--
Christophe