Re: Kafka partition alignment for event time
Posted by
shikhar on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4816.html
I am assigning timestamps using a
threshold-based extractor -- the static delta from last timestamp is probably sufficient and the PriorityQueue for allowing outliers not necessary, that is something I added while figuring out what was going on.
The timestamps across partitions don't differ that much in normal operation when stream processing is caught up with the head of the partitions, so the thresholding works well. However, during catch-up, like if I stop for a bit & start the job again, or there is no offset in ZK and I'm using 'auto.offset.reset=smallest', the source tends to emit messages with much larger deviations, and the timestamp extraction which is not partition-aware will start providing an incorrect watermark.
Aljoscha Krettek wrote
Hi,
in general it should not be a problem if one parallel instance of a sink is responsible for several Kafka partitions. It can become a problem if the timestamps in the different partitions differ by a lot and the watermark assignment logic is not able to handle this.
How are you assigning the timestamps/watermarks in your job?
Cheers,
Aljoscha
> On 08 Feb 2016, at 21:51, shikhar <
[hidden email]> wrote:
>
> Stephan explained in that thread that we're picking the min watermark when
> doing operations that join streams from multiple sources. If we have m:n
> partition-source assignment where m>n, the source is going to end up with
> the max watermark. Having m<=n ensures that the lowest watermark is used.
>
> Re: automatic enforcement, perhaps allowing for more than 1 Kafka partition
> on a source should require opt-in, e.g. allowOversubscription()
>
>
>
> --
> View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4788.html> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.