Hi,
Assuming that I have a streaming job, using 30 task managers with 4 slot each. I want to change the parallelism of 1 operator from 120 to 30. Are there anyway so that each subtask of this operator get data from 4 upstream subtasks running in the same task manager, thus avoiding network completely ?
Best regards,
Kien
Sent from TypeApp
|
Hi,
It should work like this out of the box if you use rescale method: If it will not work, please let us know. Piotrek
|
Thanks Piotr, it works.
May I ask why default behavior when reducing parallelism is rebalance, and not rescale ?
Regards,
Kien
Sent from TypeApp
On Feb 5, 2018, at 15:28, Piotr Nowojski <[hidden email]> wrote:
Hi, |
Hi,
Rebalance is more safe default setting that protects against data skew. And even the smallest data skew can create a bottleneck much larger then the serialisation/network transfer cost. Especially if one changes the parallelism to a value that’s not a result of multiplication or division (like N down to N-1). And data skew can be arbitrarily large, while rebalance overhead compare to rescale is limited. Piotrek
|
Free forum by Nabble | Edit this page |