(DEPRECATED) Apache Flink User Mailing List archive.

Order of events in a Keyed Stream

Classic

List

Threaded

8 messages Options

Harshvardhan Agrawal

Order of events in a Keyed Stream

Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!

Regards,
Harshvardhan

Hequn Cheng

Re: Order of events in a Keyed Stream

Hi Harshvardhan,

There are a number of factors to consider.

1. the consecutive Kafka messages must exist in a same topic of kafka.

2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.

3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:

Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

Harshvardhan Agrawal

Re: Order of events in a Keyed Stream

Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:

Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

Regards,
Harshvardhan

Hequn Cheng

Re: Order of events in a Keyed Stream

Hi harshvardhan,

If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes.

Best, Hequn

On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:

Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

--
Regards,
Harshvardhan

Congxian Qiu

Re: Order of events in a Keyed Stream

Hi,

Maybe the messages of the same key should be in the same partition of Kafka topic

2018-07-29 11:01 GMT+08:00 Hequn Cheng <[hidden email]>:

Hi harshvardhan,
If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes.

Best, Hequn

On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

--
Regards,
Harshvardhan

Blog:http://www.klion26.com

GTalk:qcx978132955

一切随心

Niels Basjes

Re: Order of events in a Keyed Stream

Hi,

The basic thing is that you will only get the messages in a guaranteed order if the order is maintained in all steps from creation to use.

In Kafka order is only guaranteed for messages in the same partition.

So if you need them in order by account then the producing system must use the accountid as the key used to force a specific account into a specific kafka partition.

Then the Flink Kafka source will read them sequentially in the right order, but in order to KEEP them in that order you should really to a keyby immediately after reading and used only keyedstreams from that point onwards.

As soon as you do shuffle or key by a different key then the ordering within an account is no longer guaranteed.

In general I always put a very accurate timestamp in all of my events (epoch milliseconds, in some cases even epoch microseconds) so I can always check if an order problem occurred.

Niels Basjes

On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu <[hidden email]> wrote:

Hi,
Maybe the messages of the same key should be in the same partition of Kafka topic

2018-07-29 11:01 GMT+08:00 Hequn Cheng <[hidden email]>:
Hi harshvardhan,
If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes.

Best, Hequn

On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

--
Regards,
Harshvardhan

--
Blog:http://www.klion26.com
GTalk:qcx978132955
一切随心

Best regards / Met vriendelijke groeten,

Niels Basjes

Renjie Liu

Re: Order of events in a Keyed Stream

Hi,

Another way to ensure order is by adding a logical version number for each message so that earlier version will not override later version. Timestamp depends on your ntp server works correctly.

On Sun, Jul 29, 2018 at 3:52 PM Niels Basjes <[hidden email]> wrote:

Hi,

The basic thing is that you will only get the messages in a guaranteed order if the order is maintained in all steps from creation to use.
In Kafka order is only guaranteed for messages in the same partition.
So if you need them in order by account then the producing system must use the accountid as the key used to force a specific account into a specific kafka partition.
Then the Flink Kafka source will read them sequentially in the right order, but in order to KEEP them in that order you should really to a keyby immediately after reading and used only keyedstreams from that point onwards.
As soon as you do shuffle or key by a different key then the ordering within an account is no longer guaranteed.

In general I always put a very accurate timestamp in all of my events (epoch milliseconds, in some cases even epoch microseconds) so I can always check if an order problem occurred.

Niels Basjes

On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu <[hidden email]> wrote:
Hi,
Maybe the messages of the same key should be in the same partition of Kafka topic

2018-07-29 11:01 GMT+08:00 Hequn Cheng <[hidden email]>:
Hi harshvardhan,
If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes.

Best, Hequn

On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

--
Regards,
Harshvardhan

--
Blog:http://www.klion26.com
GTalk:qcx978132955
一切随心

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Liu, Renjie

Software Engineer, MVAD

Harshvardhan Agrawal

Re: Order of events in a Keyed Stream

Thanks for the response guys.

Based on Niels response, it seems like a keyby immediately after reading from the source should map all messages with the account number on the same slot.

On Sun, Jul 29, 2018 at 05:33 Renjie Liu <[hidden email]> wrote:

Hi,
Another way to ensure order is by adding a logical version number for each message so that earlier version will not override later version. Timestamp depends on your ntp server works correctly.

On Sun, Jul 29, 2018 at 3:52 PM Niels Basjes <[hidden email]> wrote:
Hi,

The basic thing is that you will only get the messages in a guaranteed order if the order is maintained in all steps from creation to use.
In Kafka order is only guaranteed for messages in the same partition.
So if you need them in order by account then the producing system must use the accountid as the key used to force a specific account into a specific kafka partition.
Then the Flink Kafka source will read them sequentially in the right order, but in order to KEEP them in that order you should really to a keyby immediately after reading and used only keyedstreams from that point onwards.
As soon as you do shuffle or key by a different key then the ordering within an account is no longer guaranteed.

In general I always put a very accurate timestamp in all of my events (epoch milliseconds, in some cases even epoch microseconds) so I can always check if an order problem occurred.

Niels Basjes

On Sun, Jul 29, 2018 at 9:25 AM, Congxian Qiu <[hidden email]> wrote:
Hi,
Maybe the messages of the same key should be in the same partition of Kafka topic

2018-07-29 11:01 GMT+08:00 Hequn Cheng <[hidden email]>:
Hi harshvardhan,
If 1.the messages exist on the same topic and 2.there are no rebalance and 3.keyby on the same field with same value, the answer is yes.

Best, Hequn

On Sun, Jul 29, 2018 at 3:56 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hey,

The messages will exist on the same topic. I intend to keyby on the same field. The question is that will the two messages be mapped to the same task manager and on the same slot. Also will they be processed in correct order given they have the same keys?

On Fri, Jul 27, 2018 at 21:28 Hequn Cheng <[hidden email]> wrote:
Hi Harshvardhan,

There are a number of factors to consider.
1. the consecutive Kafka messages must exist in a same topic of kafka.
2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing.
3. if you perform keyBy(), you should keyBy on a field the consecutive two messages share the same value.

Best, Hequn

On Sat, Jul 28, 2018 at 12:11 AM, Harshvardhan Agrawal <[hidden email]> wrote:
Hi,

We are currently using Flink to process financial data. We are getting position data from Kafka and we enrich the positions with account and product information. We are using Ingestion time while processing events. The question I have is: say I key the position datasream by account number. If I have two consecutive Kafka messages with the same account and product info where the second one is an updated position of the first one, does Flink guarantee that the messages will be processed on the same slot in the same worker? We want to ensure that we don’t process them out of order.

Thank you!
--
Regards,
Harshvardhan

--
Regards,
Harshvardhan

--
Blog:http://www.klion26.com
GTalk:qcx978132955
一切随心

--
Best regards / Met vriendelijke groeten,

Niels Basjes

--
Liu, Renjie
Software Engineer, MVAD

Regards,
Harshvardhan