Hello all,
If an event is available now to Flink keyedcoprocess operator, and if another event will be available 1 minute later to that operator (same key), as a result of connecting the two streams, Flink does not provide any guarantee that the event available now will be processed (processElement1) before the event available 1 minute later (processElement2)? is that accurate? And if that is the case why Flink would do that maybe is counter intuitive. Any technical limitations that would forces this out of order/time scenario? Thanks again. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
With Keyed dual stream processing, you make sure that events for same key to processElement 1 and 2 are received to same partition.
However, when you receive an event in processElement1, you should store that in flinks state so that if an another event arrives on delay to processElement2, you can do your computations/joining of two events on same key from two streams. In this way, it can
handle the largest delay.
From: Mazen Ezzeddine <[hidden email]>
Sent: 16 September 2020 13:37 To: [hidden email] <[hidden email]> Subject: Connecting two streams and order of their processing Hello all,
If an event is available now to Flink keyedcoprocess operator, and if another event will be available 1 minute later to that operator (same key), as a result of connecting the two streams, Flink does not provide any guarantee that the event available now will be processed (processElement1) before the event available 1 minute later (processElement2)? is that accurate? And if that is the case why Flink would do that maybe is counter intuitive. Any technical limitations that would forces this out of order/time scenario? Thanks again. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In reply to this post by Mazen Ezzeddine
The details of what can go wrong will vary depending on the precise scenario, but no, Flink is unable to provide any such guarantee. Doing so would require being able to control the scheduling of various threads running on different machines, which isn't possible. Of course, if event A becomes available to consume from one Kafka topic a minute before an event B becomes available on some other topic (for example), it's very unlikely that event A will experience so much latency that a keyedcoprocess operator will receive B before A. But nothing guarantees this is impossible. Lots of things can cause processing hiccups: network contention, garbage collection, CPU load, redeployments, etc. Regards, David On Wed, Sep 16, 2020 at 10:07 AM Mazen Ezzeddine <[hidden email]> wrote: Hello all, |
Free forum by Nabble | Edit this page |