http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/same-parallelism-with-different-taskmanager-and-slots-skew-occurs-tp25281p25294.html
Hi Rui,
such a situation can occur if you have data skew in your data set (differently sized partitions if you key by some key). Assume you have 2 TMs with 2 slots each and you key your data by some key x. The partition assignment could look like:
TM1: slot_1 = Partition_1, slot_2 = Partition_2
TM2: slot_1 = Partition_3, slot_2 = Partition_4
Now assume that partition_1 and partition_3 are ten times bigger than partition_2 and partition_4. From a TM perspective both TMs would process the same amount of data.
If you now start 4 TMs with a single slot each you could get the following assignment:
TM1: slot_1 = Partition_1
TM2: slot_1 = Partition_2
TM3: slot_3 = Partition_3
TM4: slot_4 = Partition_4
Now from a TM perspective, TM1 and TM3 would process ten times more data than TM2 and TM4.
Does this make sense? What you could check is whether you can detect such a data skew in your input data (e.g. by counting the occurrences of items with a specific key).
Cheers,
Till