Flink Rebalance

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Rebalance

Antonio Saldivar Lezama
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards

Reply | Threaded
Open this post in threaded view
|

Re: Flink Rebalance

Elias Levy
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <[hidden email]> wrote:
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards

Reply | Threaded
Open this post in threaded view
|

Re: Flink Rebalance

Antonio Saldivar Lezama
Hello

Sending ~450 elements per second ( the values are in milliseconds start to end)
I went from:
with Rebalance

+------------+

AVGWINDOW  |

+------------+

32131.0853   |

+------------+


to this without rebalance

+------------+

| AVGWINDOW  |

+------------+

| 70.2077    |

+------------+


El jue., 9 ago. 2018 a las 17:42, Elias Levy (<[hidden email]>) escribió:
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <[hidden email]> wrote:
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards

Reply | Threaded
Open this post in threaded view
|

Re: Flink Rebalance

Paul Lam
Hi Antonio, 

AFAIK, there are two reasons for this: 

1. Rebalancing itself brings latency because it takes time to redistribute the elements. 
2. Rebalancing also messes up the order in the Kafka topic partitions, and often makes a event-time window wait longer to trigger in case you’re using event time characteristic. 

Best Regards,
Paul Lam


在 2018年8月10日,05:49,antonio saldivar <[hidden email]> 写道:

Hello

Sending ~450 elements per second ( the values are in milliseconds start to end)
I went from:
with Rebalance
+------------+
AVGWINDOW  |
+------------+
32131.0853   |
+------------+

to this without rebalance

+------------+
| AVGWINDOW  |
+------------+
| 70.2077    |
+------------+

El jue., 9 ago. 2018 a las 17:42, Elias Levy (<[hidden email]>) escribió:
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <[hidden email]> wrote:
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards


Reply | Threaded
Open this post in threaded view
|

Re: Flink Rebalance

Fabian Hueske-2
Hi,

Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function chaining in the rebalance case.

If all steps are just map functions, they can be chained in the no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing, records will be passed between map functions via serialization, network transfer, and deserialization.
This is of course much more expensive than calling a method.

Best, Fabian

2018-08-10 4:25 GMT+02:00 Paul Lam <[hidden email]>:
Hi Antonio, 

AFAIK, there are two reasons for this: 

1. Rebalancing itself brings latency because it takes time to redistribute the elements. 
2. Rebalancing also messes up the order in the Kafka topic partitions, and often makes a event-time window wait longer to trigger in case you’re using event time characteristic. 

Best Regards,
Paul Lam



在 2018年8月10日,05:49,antonio saldivar <[hidden email]> 写道:

Hello

Sending ~450 elements per second ( the values are in milliseconds start to end)
I went from:
with Rebalance
+------------+
AVGWINDOW  |
+------------+
32131.0853   |
+------------+

to this without rebalance

+------------+
| AVGWINDOW  |
+------------+
| 70.2077    |
+------------+

El jue., 9 ago. 2018 a las 17:42, Elias Levy (<[hidden email]>) escribió:
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <[hidden email]> wrote:
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards



Reply | Threaded
Open this post in threaded view
|

Re: Flink Rebalance

Antonio Saldivar Lezama
Hi Fabian

Thank you, yes there are just map functions, i will do it that way with methods to get it faster

On Fri, Aug 10, 2018, 5:58 AM Fabian Hueske <[hidden email]> wrote:
Hi,

Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function chaining in the rebalance case.

If all steps are just map functions, they can be chained in the no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing, records will be passed between map functions via serialization, network transfer, and deserialization.
This is of course much more expensive than calling a method.

Best, Fabian

2018-08-10 4:25 GMT+02:00 Paul Lam <[hidden email]>:
Hi Antonio, 

AFAIK, there are two reasons for this: 

1. Rebalancing itself brings latency because it takes time to redistribute the elements. 
2. Rebalancing also messes up the order in the Kafka topic partitions, and often makes a event-time window wait longer to trigger in case you’re using event time characteristic. 

Best Regards,
Paul Lam



在 2018年8月10日,05:49,antonio saldivar <[hidden email]> 写道:

Hello

Sending ~450 elements per second ( the values are in milliseconds start to end)
I went from:
with Rebalance
+------------+
AVGWINDOW  |
+------------+
32131.0853   |
+------------+

to this without rebalance

+------------+
| AVGWINDOW  |
+------------+
| 70.2077    |
+------------+

El jue., 9 ago. 2018 a las 17:42, Elias Levy (<[hidden email]>) escribió:
What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <[hidden email]> wrote:
Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink

Thank you 
regards