Connecting two streams and order of their processing

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Connecting two streams and order of their processing

Mazen Ezzeddine
Hello all,

If an event is available now to Flink keyedcoprocess operator, and if
another event will be available 1 minute later to that operator (same key),
as a result of connecting the two streams, Flink does not provide any
guarantee that the event available now will be processed (processElement1)
before the event available 1 minute later (processElement2)? is that
accurate?

And if that is the case why Flink would do that maybe is counter intuitive.
Any technical limitations that would forces this out of order/time scenario?

Thanks again.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Connecting two streams and order of their processing

jaswin.shah@outlook.com
With Keyed dual stream processing, you make sure that events for same key to processElement 1 and 2 are received to same partition. However, when you receive an event in processElement1, you should store that in flinks state so that if an another event arrives on delay to processElement2, you can do your computations/joining of two events on same key from two streams. In this way, it can handle the largest delay.

From: Mazen Ezzeddine <[hidden email]>
Sent: 16 September 2020 13:37
To: [hidden email] <[hidden email]>
Subject: Connecting two streams and order of their processing
 
Hello all,

If an event is available now to Flink keyedcoprocess operator, and if
another event will be available 1 minute later to that operator (same key),
as a result of connecting the two streams, Flink does not provide any
guarantee that the event available now will be processed (processElement1)
before the event available 1 minute later (processElement2)? is that
accurate?

And if that is the case why Flink would do that maybe is counter intuitive.
Any technical limitations that would forces this out of order/time scenario?

Thanks again.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Connecting two streams and order of their processing

David Anderson-3
In reply to this post by Mazen Ezzeddine
The details of what can go wrong will vary depending on the precise scenario, but no, Flink is unable to provide any such guarantee. Doing so would require being able to control the scheduling of various threads running on different machines, which isn't possible.

Of course, if event A becomes available to consume from one Kafka topic a minute before an event B becomes available on some other topic (for example), it's very unlikely that event A will experience so much latency that a keyedcoprocess operator will receive B before A. But nothing guarantees this is impossible. Lots of things can cause processing hiccups: network contention, garbage collection, CPU load, redeployments, etc. 

Regards,
David

On Wed, Sep 16, 2020 at 10:07 AM Mazen Ezzeddine <[hidden email]> wrote:
Hello all,

If an event is available now to Flink keyedcoprocess operator, and if
another event will be available 1 minute later to that operator (same key),
as a result of connecting the two streams, Flink does not provide any
guarantee that the event available now will be processed (processElement1)
before the event available 1 minute later (processElement2)? is that
accurate?

And if that is the case why Flink would do that maybe is counter intuitive.
Any technical limitations that would forces this out of order/time scenario?

Thanks again.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/