Hello, I have a job with multiple Kafka sources. They all contain certain historical data. If you use the events-time window, it will cause sources with less data to cover more sources through water mark. Is there a solution?
|
|
Hey, If you are worried about increased amount of buffered data by the WindowOperator if watermarks/event time is not progressing uniformly across multiple sources, then there is little you can do currently. FLIP-27 [1] will allow us to address this problem in more generic way. What you can currently do is one of two things: 1. Implement a custom throttling function/operator sitting after the sources, that would throttle the sources. If you chain it with the source function, it's relatively ok solution. Note, while you are blocking execution, you will be blocking for example checkpoints from happening. So it's better to sleep 10 ms per every record, compared to sleep 10 seconds once every 1000 records. 2. Throttle the sources themselves (you would need to modify or write your custom sources). But in both cases you need to manually track the event time, and manually make decision which source should be throttled and by how much. Best regards, Piotrek śr., 16 wrz 2020 o 04:17 hao kong <[hidden email]> napisał(a):
|
Thanks for the tip!
I am currently trying to implement a zookeeper-based coordinator.use it to record the current watermark and control streaming according to your first suggest. Piotr Nowojski <[hidden email]> 于2020年9月16日周三 下午11:56写道:
|
Great, thanks for the update! And please share your feedback if it worked or not. Piotrek niedz., 27 wrz 2020 o 11:20 hao kong <[hidden email]> napisał(a):
|
Free forum by Nabble | Edit this page |