http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4804.html
Hi Shikar!
What you are seeing is that some streams (here the different Kafka Partitions in one source) get merged in the source task. That happens before watermarks are generated.
In such a case, records are out-of-order when they arrive at the timestamp-extractor/watermark generator, and the watermark generator needs to be implemented such that it is aware of these out-of-order records, and uses some heuristic to generate watermarks. This is actually the general case that one also has if timestamps are not ascending inside a single Kafka partition.
You probably want to make use of the simple case, where timestamps are ascending inside one Kafka partition, and use the ascending-timestamp-extractor that auto-generates watermarks.
With Kafka, that one only works when there is 1:1 sources to partitions.
I think we can add some tooling that makes it possible to use the simple ascending timestamp extraction also in cases where one parallel source task has multiple Kafka partitions.
Effectively, the Kafka source has to internally generate the watermarks and use the same "watermark union" technique as for example the join operator.
Greetings,
Stephan