How to understand create watermark for Kafka partitions

classic Classic list List threaded Threaded
2 messages Options
qq
Reply | Threaded
Open this post in threaded view
|

How to understand create watermark for Kafka partitions

qq
Hi all,

      I confused with watermark for each Kafka partitions.  As I know watermark  created by data stream level. But why also say created watermark for each Kafka topic partitions ? As I tested, watermarks also created by global, even I run my job with parallels. And assign watermarks on Kafka consumer . Thanks .

Below text copied from flink web.


you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.

For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the ascending timestamps watermark generator will result in perfect overall watermarks.

The illustrations below show how to use the per-Kafka-partition watermark generation, and how watermarks propagate through the streaming dataflow in that case.




Thanks 
Alex Fu
Reply | Threaded
Open this post in threaded view
|

Re: How to understand create watermark for Kafka partitions

vino yang
Hi Alex,

>> But why also say created watermark for each Kafka topic partitions ?

IMO, the official documentation has explained the reason. Just copied here:

When using Apache Kafka as a data source, each Kafka partition may have a simple event time pattern (ascending timestamps or bounded out-of-orderness). However, when consuming streams from Kafka, multiple partitions often get consumed in parallel, interleaving the events from the partitions and destroying the per-partition patterns (this is inherent in how Kafka’s consumer clients work).

In that case, you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.


>> As I tested, watermarks also created by global, even I run my job with parallels. And assign watermarks on Kafka consumer .


Did you follow the official example? Can you share your program?


Best,

Vino


qq <[hidden email]> 于2019年12月13日周五 上午9:57写道:
Hi all,

      I confused with watermark for each Kafka partitions.  As I know watermark  created by data stream level. But why also say created watermark for each Kafka topic partitions ? As I tested, watermarks also created by global, even I run my job with parallels. And assign watermarks on Kafka consumer . Thanks .

Below text copied from flink web.


you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.

For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the ascending timestamps watermark generator will result in perfect overall watermarks.

The illustrations below show how to use the per-Kafka-partition watermark generation, and how watermarks propagate through the streaming dataflow in that case.




Thanks 
Alex Fu