Hi, I'm trying to use Kafka as an event store and I want to create several partitions to improve read/write throughput. Occasionally I need to rewind offset to a previous position for recomputing. Since order isn't guaranteed among partitions in Kafka, does this mean that Flink won't produce the same results as before when rewind even if it uses event time? For example, consumer for a partition progresses extremely fast and raises watermark, so events from other partitions are discarded. Is there any ways to prevent this from happening? Thanks in advance! Ruibin |
Hi Ruibin, Are you finding how to generate watermark pre Kafka partition? Flink provides Kafka-partition-aware watermark generation. [1]Best, Vino 邢瑞斌 <[hidden email]> 于2019年12月25日周三 下午8:27写道:
|
In reply to this post by 邢瑞斌
If I understood correctly, different partitions of Kafka would be emitted by different source tasks with different watermark progress. And the Flink framework would align the different watermarks to only output the smallest watermark among them, so the events from slow partitions would not be discarded because the downstream operator would only see the watermark based on the slow partition atm. You can refer to [1] for some details. As for rewinding the offset of partition position, I guess it only happens in failure recovery case or you manually restart the job. Anyway all the topology tasks would be restarted and previous received watermarks are cleared. So it would also not discard the events in this case. Unless you can only rewind some source task to previous positions and keep other downstream tasks still running, it might have the issues you concern. But Flink can not support such operation/function atm. :) [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/event_timestamps_watermarks.html Best, Zhijiang
|
Free forum by Nabble | Edit this page |