Hello,
I am running a streaming Beam app with the Flink runner(java).
Checkpoints and savepoints are configured to go to s3 and HA is enabled using Zookeeper.
I was running the app with 3 task managers. I took a savepoint and started the app with 6 task managers. My input topic has 12 partitions. With 3 pods, the partitions were distributed evenly. After restarting with the increased number of task managers, some
partitions are being consumed by 2 task managers.
partition |
task manager
|
0 |
4 |
1 |
4 |
2 |
1 |
3 |
1 |
4 |
3 |
5 |
3 |
6 |
4, 0 |
7 |
4, 0 |
8 |
1, 2 |
9 |
1, 2 |
10 |
3, 5 |
11 |
3, 5 |
Looks like there were 3 task managers [1,3,4] and they correctly distributed partitions between them.
Then 3 new task managers were added [0,2,5] and partitions were not properly re-distributed.
Where could this metadata be coming from?
Omkar