I'm wondering if there is a way to avoid consuming too fast from partitions
that not have as much data as the other ones in the same topic by keeping
them more or less synchronized by its ingestion timestamp. Similar to what
kafka streams does:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+SynchronizationWe are having an issue where partitions with less data are consumed very
fast which creates a lot of windows that can't be triggered until the
partitions with more data are consumed and the watermark gets advanced. It
seems that this issue should be quite common but we can't seem to find any
standard solution to it. Maybe is just that our partitions are too
unbalanced but still, without having a way to bound the skew between
partition (for example when processing accumulated data) it seems like a
potential source of problems.
Anyone have an idea or suggestion to deal with this issue?
Thanks,
Gerard
--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/