http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Parallel-stream-partitions-tp21576p21627.html
Hi Nick,
What Ken said is correct, but let me add two more things.
1) State
Usually, you only need to partition (keyBy()) the data if you want to process tuples with the same same key together.
Therefore, it is necessary to hold some tuples or intermediate results (like partial or running aggregates) in state. Flink is a stateful stream processor and offers many features around state management.
One of them is keyed state, i.e., state that is maintained per key. When a function processes a tuple, keyed state is automatically put into the context of the current key. Because the state is always associated with a key, it is not a problem that a function instance processes multiple keys.
2) Ordering
In a parallel system it is very expensive to reason about or guarantee ordering. Flink only ensures that tuples that flow through a partition are processed in order. However, order across different partitions cannot be guaranteed. Hence, shuffles (due to keyBy or changed parallelism) can change the order.
Best,
Fabian