Timestamp synchronized message consumption across kafka partitions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Timestamp synchronized message consumption across kafka partitions

gerardg
I'm wondering if there is a way to avoid consuming too fast from partitions
that not have as much data as the other ones in the same topic by keeping
them more or less synchronized by its ingestion timestamp. Similar to what
kafka streams does:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

We are having an issue where partitions with less data are consumed very
fast which creates a lot of windows that can't be triggered until the
partitions with more data are consumed and the watermark gets advanced. It
seems that this issue should be quite common but we can't seem to find any
standard solution to it. Maybe is just that our partitions are too
unbalanced but still, without having a way to bound the skew between
partition (for example when processing accumulated data) it seems like a
potential source of problems.

Anyone have an idea or suggestion to deal with this issue?

Thanks,

Gerard



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Timestamp synchronized message consumption across kafka partitions

gerardg
I'll answer myself. I guess the most viable option for now is to wait for the work inĀ http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html

On Thu, Mar 7, 2019, 3:24 PM gerardg <[hidden email]> wrote:
I'm wondering if there is a way to avoid consuming too fast from partitions
that not have as much data as the other ones in the same topic by keeping
them more or less synchronized by its ingestion timestamp. Similar to what
kafka streams does:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

We are having an issue where partitions with less data are consumed very
fast which creates a lot of windows that can't be triggered until the
partitions with more data are consumed and the watermark gets advanced. It
seems that this issue should be quite common but we can't seem to find any
standard solution to it. Maybe is just that our partitions are too
unbalanced but still, without having a way to bound the skew between
partition (for example when processing accumulated data) it seems like a
potential source of problems.

Anyone have an idea or suggestion to deal with this issue?

Thanks,

Gerard



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/