Flink 1.2 time window operation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.2 time window operation

Dominik Safaric
Hi all,

Lately I’ve been investigating onto the performance characteristics of Flink part of our internal benchmark. Part of this we’ve developed and deployed an application that pools data from Kafka, groups the data by a key during a fixed time window of a minute.

In total, the topic that the KafkaConsumer pooled from consists of 100 million messages each of 100 bytes size. What we were expecting is that no records will be neither read nor produced back to Kafka for the first minute of the window operation - however, this is unfortunately not the case. Below you may find a plot showing the number of records produced per second.

Could anyone provide an explanation onto the behaviour shown in the graph below? What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

 

Flink_6000ms_Window_Throughput (1).pdf (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 time window operation

Tzu-Li (Gordon) Tai
Hi Dominik,

Was the job running with processing time or event time? If event time, how are you producing the watermarks?
Normally to understand how windows are firing in Flink, these two factors would be the place to look at.
I can try to further explain this once you provide info with these. Also, are you using Kafka 0.10?

Cheers,
Gordon

On March 27, 2017 at 11:25:49 PM, Dominik Safaric ([hidden email]) wrote:

Hi all,

Lately I’ve been investigating onto the performance characteristics of Flink part of our internal benchmark. Part of this we’ve developed and deployed an application that pools data from Kafka, groups the data by a key during a fixed time window of a minute.

In total, the topic that the KafkaConsumer pooled from consists of 100 million messages each of 100 bytes size. What we were expecting is that no records will be neither read nor produced back to Kafka for the first minute of the window operation - however, this is unfortunately not the case. Below you may find a plot showing the number of records produced per second.

Could anyone provide an explanation onto the behaviour shown in the graph below? What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 time window operation

Dominik Safaric
Hi Gordon,

The job was run using processing time. The Kafka broker version I’ve used was 0.10.1.1. 

Dominik

On 30 Mar 2017, at 08:35, Tzu-Li (Gordon) Tai <[hidden email]> wrote:

Hi Dominik,

Was the job running with processing time or event time? If event time, how are you producing the watermarks?
Normally to understand how windows are firing in Flink, these two factors would be the place to look at.
I can try to further explain this once you provide info with these. Also, are you using Kafka 0.10?

Cheers,
Gordon

On March 27, 2017 at 11:25:49 PM, Dominik Safaric ([hidden email]) wrote:

Hi all, 

Lately I’ve been investigating onto the performance characteristics of Flink part of our internal benchmark. Part of this we’ve developed and deployed an application that pools data from Kafka, groups the data by a key during a fixed time window of a minute.  

In total, the topic that the KafkaConsumer pooled from consists of 100 million messages each of 100 bytes size. What we were expecting is that no records will be neither read nor produced back to Kafka for the first minute of the window operation - however, this is unfortunately not the case. Below you may find a plot showing the number of records produced per second.  

Could anyone provide an explanation onto the behaviour shown in the graph below? What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?  

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 time window operation

Tzu-Li (Gordon) Tai
Hi,

Thanks for the clarification.

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet. Does this explain what you have plotted in the diagram you attached (sorry, I can’t really reason about the diagram because I’m not so sure what the values of the x-y axes represent)?

If you’re writing the outputs of the window operation to Kafka (by adding a Kafka sink after the windowing), then yes it should only write to Kafka when the window has fired. The characteristics will also differ for different types of windows, so you should definitely take a look at the Windowing docs [1] about them.

Cheers,
Gordon

On March 30, 2017 at 2:37:41 PM, Dominik Safaric ([hidden email]) wrote:

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 time window operation

Dominik Safaric

First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet.

This is for sure true. However, the plot shows the number of records produced per second, where each record was assigned a created at timestamp while being created and before being pushed back to Kafka. Sorry I did not clarify this before. Anyway, because of this I would expect to have a certain lag. Of course, messages will not only be produced into Kafka exactly at window expiry and then the produced shutdown - however, what concerns me is that messages were produced to Kafka before the first window expired - hence the questions. 

If you’re writing the outputs of the window operation to Kafka (by adding a Kafka sink after the windowing), then yes it should only write to Kafka when the window has fired.

Hence, I this behaviour that you’ve described and we’ve expected did not occur. 

If it would help, I can share the source code and a detail Flink configuration. 

Cheers,
Dominik

On 30 Mar 2017, at 13:09, Tzu-Li (Gordon) Tai <[hidden email]> wrote:

Hi,

Thanks for the clarification.

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet. Does this explain what you have plotted in the diagram you attached (sorry, I can’t really reason about the diagram because I’m not so sure what the values of the x-y axes represent)?

If you’re writing the outputs of the window operation to Kafka (by adding a Kafka sink after the windowing), then yes it should only write to Kafka when the window has fired. The characteristics will also differ for different types of windows, so you should definitely take a look at the Windowing docs [1] about them.

Cheers,
Gordon

On March 30, 2017 at 2:37:41 PM, Dominik Safaric ([hidden email]) wrote:

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 time window operation

Tzu-Li (Gordon) Tai
Hi Dominik,

I see, thanks for explaining the diagram.

This is expected because the 1 minute window in your case is aligned with the beginning of every minute.

For example, if the first element element comes at 12:10:45, then the element will be put in the window of 12:10:00 to 12:10:59.
Therefore, it will fire after 14 seconds instead of 1 minute.

Does that explain what you are experiencing?

Cheers,
Gordon


On March 31, 2017 at 3:06:56 AM, Dominik Safaric ([hidden email]) wrote:

First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet.

This is for sure true. However, the plot shows the number of records produced per second, where each record was assigned a created at timestamp while being created and before being pushed back to Kafka. Sorry I did not clarify this before. Anyway, because of this I would expect to have a certain lag. Of course, messages will not only be produced into Kafka exactly at window expiry and then the produced shutdown - however, what concerns me is that messages were produced to Kafka before the first window expired - hence the questions. 

If you’re writing the outputs of the window operation to Kafka (by adding a Kafka sink after the windowing), then yes it should only write to Kafka when the window has fired.

Hence, I this behaviour that you’ve described and we’ve expected did not occur. 

If it would help, I can share the source code and a detail Flink configuration. 

Cheers,
Dominik

On 30 Mar 2017, at 13:09, Tzu-Li (Gordon) Tai <[hidden email]> wrote:

Hi,

Thanks for the clarification.

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?

First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet. Does this explain what you have plotted in the diagram you attached (sorry, I can’t really reason about the diagram because I’m not so sure what the values of the x-y axes represent)?

If you’re writing the outputs of the window operation to Kafka (by adding a Kafka sink after the windowing), then yes it should only write to Kafka when the window has fired. The characteristics will also differ for different types of windows, so you should definitely take a look at the Windowing docs [1] about them.

Cheers,
Gordon

On March 30, 2017 at 2:37:41 PM, Dominik Safaric ([hidden email]) wrote:

What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet?