Kafka watermarks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Kafka watermarks

nragon
When consuming from kafka should we use BOOTE inside consumer or after?
Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Kafka watermarks

Nico Kruber
Can you clarify a bit more on what you want to achieve? Also, what is "BOOTE"?


Nico

On Tuesday, 20 June 2017 13:45:06 CEST nragon wrote:

> When consuming from kafka should we use BOOTE inside consumer or after?
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-w
> atermarks-tp13849.html Sent from the Apache Flink User Mailing List archive.
> mailing list archive at Nabble.com.


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Kafka watermarks

nragon
This post was updated on .
So, in order to work with event time I have to options, inside kafka consumer or after kafka consumer.
The first I can use:
FlinkKafkaConsumer09 consumer.....
consumer. assignTimestampsAndWatermarks()

The other option:
FlinkKafkaConsumer09 consumer.....
DataStream dataStream =env.addSource(consumer); dataStream. assignTimestampsAndWatermarks()

Any recommendation?
Reply | Threaded
Open this post in threaded view
|

Re: Kafka watermarks

Nico Kruber
according to the javadoc of
FlinkKafkaConsumerBase#assignTimestampsAndWatermarks():

"Specifies an {@link AssignerWithPunctuatedWatermarks} to emit watermarks in a
punctuated manner. The watermark extractor will run per Kafka partition,
watermarks will be merged across partitions in the same way as in the Flink
runtime, when streams are merged.

When a subtask of a FlinkKafkaConsumer source reads multiple Kafka partitions,
the streams from the partitions are unioned in a "first come first serve"
fashion. Per-partition characteristics are usually lost that way. For example,
if the timestamps are strictly ascending per Kafka partition, they will not be
strictly ascending in the resulting Flink DataStream, if the parallel source
subtask reads more that one partition.

Running timestamp extractors / watermark generators directly inside the Kafka
source, per Kafka partition, allows users to let them exploit the per-
partition characteristics."

Thus, if you can leverage Kafka per-partition characteristics, do it there,
otherwise it probably does not matter.


Nico

On Tuesday, 20 June 2017 17:46:23 CEST nragon wrote:

> So, in order to work with event time I have to options, inside kafka
> consumer or after kafka consumer.
> The first I can use:
> FlinkKafkaConsumer09<DataParameterMap> consumer.....
> consumer. assignTimestampsAndWatermarks()
>
> The other option:
> FlinkKafkaConsumer09<DataParameterMap> consumer.....
> DataStream dataStream =env.addSource(consumer); dataStream.
> assignTimestampsAndWatermarks()
>
> Any recommendation?
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-w
> atermarks-tp13849p13872.html Sent from the Apache Flink User Mailing List
> archive. mailing list archive at Nabble.com.


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Kafka watermarks

nragon
I guess using BoundedOutOfOrdernessTimestampExtractor inside consumer will work.
Thanks