Inconsistent Results in Event Time based Sliding Window processing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent Results in Event Time based Sliding Window processing

Sujit Sakre
Hi,

I have been using Flink 1.1.1 with Kafka 0.9 to process real time streams. We have written our own Window Function and are processing data with Sliding Windows. We are using Event Time and use a custom watermark generator.

We select a particular window out of multiple sliding windows and process all items in the window in an iterative loop to increment counts of the items selected. After this we call the sink method to log the result in a database.

This is working fine most of the times, i.e. it produces the expected result most of the times. However there are situations when certain windows are not processed (under same test conditions), this means the results are less than expected i.e. the counts are less than expected. Sometimes, items instead of certain windows not getting processed, certain items do not get processed in the window. This is unpredictable.

I wish to know what could be the cause of this inconsistent behaviour. How do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem persists.

Please could you suggest about what the problem could be and how to resolve this.

Many thanks.


Sujit Sakre


This email is sent on behalf of Northgate Public Services (UK) Limited and its associated companies including Rave Technologies (India) Pvt Limited (together "Northgate Public Services") and is strictly confidential and intended solely for the addressee(s). 
If you are not the intended recipient of this email you must: (i) not disclose, copy or distribute its contents to any other person nor use its contents in any way or you may be acting unlawfully;  (ii) contact Northgate Public Services immediately on +44(0)1908 264500 quoting the name of the sender and the addressee then delete it from your system.
Northgate Public Services has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted.  You should scan attachments (if any) for viruses.

Northgate Public Services (UK) Limited, registered in England and Wales under number 00968498 with a registered address of Peoplebuilding 2, Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, HP2 4NN.  Rave Technologies (India) Pvt Limited, registered in India under number 117068 with a registered address of 2nd Floor, Ballard House, Adi Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 400001.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Results in Event Time based Sliding Window processing

Nico Kruber
Hi Sujit,
this does indeed sound strange and we are not aware of any data loss issues.
Are there any exceptions or other errors in the job/taskmanager logs?
Do you have a minimal working example? Is it that whole windows are not
processed or just single items inside a window?


Nico

On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:

> Hi,
>
> I have been using Flink 1.1.1 with Kafka 0.9 to process real time streams.
> We have written our own Window Function and are processing data with
> Sliding Windows. We are using Event Time and use a custom watermark
> generator.
>
> We select a particular window out of multiple sliding windows and process
> all items in the window in an iterative loop to increment counts of the
> items selected. After this we call the sink method to log the result in a
> database.
>
> This is working fine most of the times, i.e. it produces the expected
> result most of the times. However there are situations when certain windows
> are not processed (under same test conditions), this means the results are
> less than expected i.e. the counts are less than expected. Sometimes, items
> instead of certain windows not getting processed, certain items do not get
> processed in the window. This is unpredictable.
>
> I wish to know what could be the cause of this inconsistent behaviour. How
> do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> persists.
>
> Please could you suggest about what the problem could be and how to resolve
> this.
>
> Many thanks.
>
>
> *Sujit Sakre*


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Results in Event Time based Sliding Window processing

Sujit Sakre
Hi Nico,

Thanks for the reply.

There are no exceptions or other errors in the job/task manager logs. I am running this example from Eclipse IDE with Kafka and Zookeeper running separately; in the console there are no errors shown while processing. Previously, we were missing some windows due to watermark exceeding the upcoming data timestamps, however, that is not the case anymore.

There is a working example. I will share the code and data with you separately on your email ID. 

The results of processing are of three types:
1) Complete set of results is obtained without any missing calculations
2) A few windows (from 1 or 2 to more) are missing uniformly from calculations
3) In a particular window, only selective data is missing, whereas other data is processed accurately

These results are for the same set of inputs under same processing steps.

This is not predictable, and makes identification of error difficult, as sometimes it works i.e. results of pattern #1 and sometimes results of pattern #2 or #3 (#3 occurs less frequently, however it does take place.




Sujit Sakre

Senior Technical Architect

Tel: +91 22 6660 6600 
Ext: 
247
Direct: 6740 5247

Mobile: +91 98672 01204

 

Follow us on: Twitter LinkedIn YouTube

 

 

Please consider the environment before printing this email


On 14 February 2017 at 18:55, Nico Kruber <[hidden email]> wrote:
Hi Sujit,
this does indeed sound strange and we are not aware of any data loss issues.
Are there any exceptions or other errors in the job/taskmanager logs?
Do you have a minimal working example? Is it that whole windows are not
processed or just single items inside a window?


Nico

On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:
> Hi,
>
> I have been using Flink 1.1.1 with Kafka 0.9 to process real time streams.
> We have written our own Window Function and are processing data with
> Sliding Windows. We are using Event Time and use a custom watermark
> generator.
>
> We select a particular window out of multiple sliding windows and process
> all items in the window in an iterative loop to increment counts of the
> items selected. After this we call the sink method to log the result in a
> database.
>
> This is working fine most of the times, i.e. it produces the expected
> result most of the times. However there are situations when certain windows
> are not processed (under same test conditions), this means the results are
> less than expected i.e. the counts are less than expected. Sometimes, items
> instead of certain windows not getting processed, certain items do not get
> processed in the window. This is unpredictable.
>
> I wish to know what could be the cause of this inconsistent behaviour. How
> do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> persists.
>
> Please could you suggest about what the problem could be and how to resolve
> this.
>
> Many thanks.
>
>
> *Sujit Sakre*



This email is sent on behalf of Northgate Public Services (UK) Limited and its associated companies including Rave Technologies (India) Pvt Limited (together "Northgate Public Services") and is strictly confidential and intended solely for the addressee(s). 
If you are not the intended recipient of this email you must: (i) not disclose, copy or distribute its contents to any other person nor use its contents in any way or you may be acting unlawfully;  (ii) contact Northgate Public Services immediately on +44(0)1908 264500 quoting the name of the sender and the addressee then delete it from your system.
Northgate Public Services has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted.  You should scan attachments (if any) for viruses.

Northgate Public Services (UK) Limited, registered in England and Wales under number 00968498 with a registered address of Peoplebuilding 2, Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, HP2 4NN.  Rave Technologies (India) Pvt Limited, registered in India under number 117068 with a registered address of 2nd Floor, Ballard House, Adi Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 400001.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Results in Event Time based Sliding Window processing

Nico Kruber
Hmm, without any exceptions in the logs, I'd say that you may be on the right
track with elements arriving with timestamps older than the last watermark.

You may play around with allowed lateness
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/
windows.html#allowed-lateness
to see if this is actually the case.


Nico

On Tuesday, 14 February 2017 19:39:46 CET Sujit Sakre wrote:

> Hi Nico,
>
> Thanks for the reply.
>
> There are no exceptions or other errors in the job/task manager logs. I am
> running this example from Eclipse IDE with Kafka and Zookeeper running
> separately; in the console there are no errors shown while processing.
> Previously, we were missing some windows due to watermark exceeding the
> upcoming data timestamps, however, that is not the case anymore.
>
> There is a working example. I will share the code and data with you
> separately on your email ID.
>
> The results of processing are of three types:
> 1) Complete set of results is obtained without any missing calculations
> 2) A few windows (from 1 or 2 to more) are missing uniformly from
> calculations
> 3) In a particular window, only selective data is missing, whereas other
> data is processed accurately
>
> These results are for the same set of inputs under same processing steps.
>
> This is not predictable, and makes identification of error difficult, as
> sometimes it works i.e. results of pattern #1 and sometimes results of
> pattern #2 or #3 (#3 occurs less frequently, however it does take place.
>
>
>
>
> *Sujit Sakre*
>
> Senior Technical Architect
> Tel: +91 22 6660 6600
> Ext:
> 247
> Direct: 6740 5247
>
> Mobile: +91 98672 01204
>
> www.rave-tech.com
>
>
>
> Follow us on: Twitter <https://twitter.com/Rave_Tech> / LinkedIn
> <https://in.linkedin.com/in/ravetechnologies> / YouTube
> <https://www.youtube.com/channel/UCTaO1am-cm4FqnQCGdB6ExA>
>
>
>
> Rave Technologies – A Northgate Public Services Company
> <https://www.google.co.in/maps/place/Rave+Technologies/@19.0058078,72.823516
> ,17z/data=!3m1!4b1!4m5!3m4!1s0x3bae17fcde71c3b9:0x1e2a8c0c4a075145!8m2!3d19.
> 0058078!4d72.8257047>
>
>
>
> Please consider the environment before printing this email
>
> On 14 February 2017 at 18:55, Nico Kruber <[hidden email]> wrote:
> > Hi Sujit,
> > this does indeed sound strange and we are not aware of any data loss
> > issues.
> > Are there any exceptions or other errors in the job/taskmanager logs?
> > Do you have a minimal working example? Is it that whole windows are not
> > processed or just single items inside a window?
> >
> >
> > Nico
> >
> > On Tuesday, 14 February 2017 16:57:31 CET Sujit Sakre wrote:
> > > Hi,
> > >
> > > I have been using Flink 1.1.1 with Kafka 0.9 to process real time
> >
> > streams.
> >
> > > We have written our own Window Function and are processing data with
> > > Sliding Windows. We are using Event Time and use a custom watermark
> > > generator.
> > >
> > > We select a particular window out of multiple sliding windows and
> > > process
> > > all items in the window in an iterative loop to increment counts of the
> > > items selected. After this we call the sink method to log the result in
> > > a
> > > database.
> > >
> > > This is working fine most of the times, i.e. it produces the expected
> > > result most of the times. However there are situations when certain
> >
> > windows
> >
> > > are not processed (under same test conditions), this means the results
> >
> > are
> >
> > > less than expected i.e. the counts are less than expected. Sometimes,
> >
> > items
> >
> > > instead of certain windows not getting processed, certain items do not
> >
> > get
> >
> > > processed in the window. This is unpredictable.
> > >
> > > I wish to know what could be the cause of this inconsistent behaviour.
> >
> > How
> >
> > > do we resolve it. We have integrated 1.2 and Kafka 0.10 now, the problem
> > > persists.
> > >
> > > Please could you suggest about what the problem could be and how to
> >
> > resolve
> >
> > > this.
> > >
> > > Many thanks.
> > >
> > >
> > > *Sujit Sakre*


signature.asc (201 bytes) Download Attachment