http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/how-to-propagate-watermarks-across-multiple-jobs-tp41813.html
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark?
Is anyone have do something similar and solved this problem?
Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==> xxxWindow2 resultSinkToKafka(result-topic).
The new job1: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==> resultSinkToKafka(mid-topic).
The new job2: kafkaStream1(mid-topic) => xxxWindow2 ==> resultSinkToKafka(result-topic).
The watermark for window1 and window 2 is separated to two jobs, which also seems to be working, but this introduces a 5-minute delay for window2 (both window is 5min's cycle).
The key problem is that the window's cycle is 5min, so the window2 will have a 5min's delay.
If watermark can be transferred between jobs, it is not a problem anymore.