Hi, We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order. Thank you! -- Regards,
Harshvardhan |
Hi Harshvardhan, There are a number of factors to consider. 1. the consecutive Kafka messages must exist in a same topic of kafka. 2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing. 3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value. Best, Hequn On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
|
Hey, The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys? On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Regards,
Harshvardhan |
Hi harshvardhan, If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes. Best, Hequn On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:
|
Hi, Maybe the messages of the same key should be in the same partition of Kafka topic2018-07-29 11:01 GMT+08:00 Hequn Cheng <[hidden email]>:
|
Hi, The basic thing is that you will only get the messages in a guaranteed order if the order is maintained in all steps from creation to use. In Kafka order is only guaranteed for messages in the same partition. So if you need them in order by account then the producing system must use the accountid as the key used to force a specific account into a specific kafka partition. Then the Flink Kafka source will read them sequentially in the right order, but in order to KEEP them in that order you should really to a keyby immediately after reading and used only keyedstreams from that point onwards. As soon as you do shuffle or key by a different key then the ordering within an account is no longer guaranteed. In general I always put a very accurate timestamp in all of my events (epoch milliseconds, in some cases even epoch microseconds) so I can always check if an order problem occurred. Niels Basjes On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu <[hidden email]> wrote:
Best regards / Met vriendelijke groeten,
Niels Basjes |
Hi, Another way to ensure order is by adding a logical version number for each message so that earlier version will not override later version. Timestamp depends on your ntp server works correctly. On Sun, Jul 29, 2018 at 3:52 PM Niels Basjes <[hidden email]> wrote:
Liu, Renjie Software Engineer, MVAD |
Thanks for the response guys. Based on Niels response, it seems like a keyby immediately after reading from the source should map all messages with the account number on the same slot. On Sun, Jul 29, 2018 at 05:33 Renjie Liu <[hidden email]> wrote:
Regards,
Harshvardhan |
Free forum by Nabble | Edit this page |