Re: how to propagate watermarks across multiple jobs

Posted by Piotr Nowojski-4 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/how-to-propagate-watermarks-across-multiple-jobs-tp41813p41875.html

Hi,

Can not you write the watermark as a special event to the "mid-topic"? In the "new job2" you would parse this event and use it to assign watermark before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1], you could look at its code for inspiration.

Piotrek 

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/streaming/connectors/kafka/shuffle/FlinkKafkaShuffle.html

pon., 1 mar 2021 o 13:01 yidan zhao <[hidden email]> napisał(a):
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark? 

Is anyone have do something similar and solved this problem?

Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==> xxxWindow2 resultSinkToKafka(result-topic).

The new job1: kafkaStream1(src-topic) => xxxProcess => xxxWindow1 ==> resultSinkToKafka(mid-topic).
The new job2: kafkaStream1(mid-topic) => xxxWindow2 ==> resultSinkToKafka(result-topic).

The watermark for window1 and window 2 is separated to two jobs, which also seems to be working, but this introduces a 5-minute delay for window2 (both window is 5min's cycle). 

The key problem is that the window's cycle is 5min, so the window2 will have a 5min's delay. 
If watermark can be transferred between jobs, it is not a problem anymore.