Interpretation of Trigger and Eviction on a window

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpretation of Trigger and Eviction on a window

nsengupta
Hello there.

I have just started exploring Apache Flink, and it has immediately got me excited. Because I am a beginner, my questions may be a bit naive. Please bear with me.

I refer to this particular sentence from Flink 0.10.0 Guide

After the trigger fires, and before the function (e.g., sumcount) is applied to the window contents, an optional Evictor removes some elements from the beginning of the window before the remaining elements are passed on to the function '

I am a bit confused with the assertion that elements are evicted before the function is applied. Let me elaborate what my understanding is.

Let us say that my window has a 'count' trigger of 10 elements, with some time-pane of 2 seconds (assumption: rate of ingestion is fast enough for at least 10 elements to arrive within 2 seconds). 

windowedStream.trigger(CountTrigger.of(10)).evictor(CounEvictor.of(10)).sum(_._1) // summation of the 2nd field of a tuple

Now, till the time 9 elements have gathered in the window, the trigger is dormant. After the 10th element  enters the window-pane, the trigger is fired. At this point in time, all these 10 elements should be passed to the _sum_ function so that correct summated value is generated and **only then** the evictor is allowed to take out all the 10 elements leaving the window-pane empty. The window's element count is set to zero and  it awaits the arrival of the next element.

However, what the documents seems to suggest is that the evictor will be able to take out _some_ (how many?) elements from the _beginning_ of the window, before the _sum_ function can see the elements. Isn't this counterintuitive or am I missing something obvious here? 

Will keenly wait to hear from you.

-- Nirmalya

--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

Fabian Hueske-2
Hi Nirmalya,

it is correct that the evictor is called BEFORE the window function is applied because this is required to support certain types of sliding windows.
If you want to remove all elements from the window after the window function was applied, you need a trigger that purges the window. The CountTrigger is not doing this, how ever you can wrap it in a PurgingTrigger:

windowedStream.trigger(PurgingTrigger.of(CountTrigger.of(10))).sum(_._1)

Best, Fabian

2015-11-27 13:25 GMT+01:00 Nirmalya Sengupta <[hidden email]>:
Hello there.

I have just started exploring Apache Flink, and it has immediately got me excited. Because I am a beginner, my questions may be a bit naive. Please bear with me.

I refer to this particular sentence from Flink 0.10.0 Guide

After the trigger fires, and before the function (e.g., sumcount) is applied to the window contents, an optional Evictor removes some elements from the beginning of the window before the remaining elements are passed on to the function '

I am a bit confused with the assertion that elements are evicted before the function is applied. Let me elaborate what my understanding is.

Let us say that my window has a 'count' trigger of 10 elements, with some time-pane of 2 seconds (assumption: rate of ingestion is fast enough for at least 10 elements to arrive within 2 seconds). 

windowedStream.trigger(CountTrigger.of(10)).evictor(CounEvictor.of(10)).sum(_._1) // summation of the 2nd field of a tuple

Now, till the time 9 elements have gathered in the window, the trigger is dormant. After the 10th element  enters the window-pane, the trigger is fired. At this point in time, all these 10 elements should be passed to the _sum_ function so that correct summated value is generated and **only then** the evictor is allowed to take out all the 10 elements leaving the window-pane empty. The window's element count is set to zero and  it awaits the arrival of the next element.

However, what the documents seems to suggest is that the evictor will be able to take out _some_ (how many?) elements from the _beginning_ of the window, before the _sum_ function can see the elements. Isn't this counterintuitive or am I missing something obvious here? 

Will keenly wait to hear from you.

-- Nirmalya

--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."

Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

nsengupta
In reply to this post by nsengupta
Hello Fabian,

From your reply to this thread: 
' it is correct that the evictor is called BEFORE the window function is applied because this is required to support certain types of sliding windows. '

This is clear to me now. However, my point was about the way it is described in the User-guide. The guide says this:
After the trigger fires, and before the function (e.g., sumcount) is applied to the window contents, an optional Evictor removes some elements from the beginning of the window before the remaining elements are passed on to the function '

As I read it again, I see where the problem lies. It says some elements are removed before the **rest** are passed to the function. This is not what happens, I think. Evictor removes elements and the function sees this set of removed elements, not the remaining elements. Remaining elements remain in the window and are perhaps picked up by the Evictor next time.

Carrying on from your elaboration, I think guide's statement can be better rearranged as:

After the trigger fires, the function (e.g., sumcount) is applied to the entire contents of the window. However, an optionally provided Evictor, removes some elements from the beginning of the window, according to the criteria of eviction. The function is then applied to this set of __removed__ elements. '
 
Let me know if I am way off the mark here.

-- Nirmalya

--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

Aljoscha Krettek
Hi,
the function is in fact applied to the remaining elements (at least I hope it is). So the first sentence should be the correct one.

Cheers,
Aljoscha

> On 28 Nov 2015, at 03:14, Nirmalya Sengupta <[hidden email]> wrote:
>
> Hello Fabian,
>
> From your reply to this thread:
> ' it is correct that the evictor is called BEFORE the window function is applied because this is required to support certain types of sliding windows. '
>
> This is clear to me now. However, my point was about the way it is described in the User-guide. The guide says this:
> ' After the trigger fires, and before the function (e.g., sum, count) is applied to the window contents, an optional Evictor removes some elements from the beginning of the window before the remaining elements are passed on to the function '
>
> As I read it again, I see where the problem lies. It says some elements are removed before the **rest** are passed to the function. This is not what happens, I think. Evictor removes elements and the function sees this set of removed elements, not the remaining elements. Remaining elements remain in the window and are perhaps picked up by the Evictor next time.
>
> Carrying on from your elaboration, I think guide's statement can be better rearranged as:
>
> ' After the trigger fires, the function (e.g., sum, count) is applied to the entire contents of the window. However, an optionally provided Evictor, removes some elements from the beginning of the window, according to the criteria of eviction. The function is then applied to this set of __removed__ elements. '
>  
> Let me know if I am way off the mark here.
>
> -- Nirmalya
>
> --
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is where they should be.
> Now put the foundation under them."

Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

nsengupta
In reply to this post by nsengupta
Hi Aljoscha <[hidden email]>,

Thanks for taking interest in my post and question.

 If the Evictor removes elements  _before_ the function is applied, then what happens the first time, the Evictor is acting? That's what I am failing to understand. At the beginning of the operation on the Stream, he Trigger finds 5 elements, the Evictor removes 2 of them (let's say), and Sum sees incorrect number of elements. Isn't this possible or again, am I demonstrating my loose grasp of the matter? 

Please correct me.

-- N


--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

Aljoscha Krettek
Hi,
the Evictor is very tricky to understand, I’m afraid. What happens when a Trigger fires is the following:
 1. Trigger fires
 2. Evictor can remove elements from the window buffer
 3. Window function processes the elements that remain in the window buffer

The tricky thing here is that the Evictor should really be called “Keeper”. What it does in fact is specify how many elements should be kept in the buffer. For example, an Evictor “CountEvict(5)” means, keep 5 elements, a “TimeEvictor(5 minutes)” keeps the elements that a younger than 5 minutes. I admit this is somewhat counterintuitive but this is how the policies work in IBM Infosphere Streams and they influenced the early design of the Flink Trigger/Eviction policies. (See here for a description of the semantics of IBM Infosphere Streams: http://www.cs.bilkent.edu.tr/~bgedik/homepage/lib/exe/fetch.php/wiki:pubs:windowing.pdf)

If it is still unclear, could you please but together a working example where the behavior is unclear to you so that we can have a look at it.

Cheers,
Aljoscha

> On 30 Nov 2015, at 12:08, Nirmalya Sengupta <[hidden email]> wrote:
>
> Hi Aljoscha <[hidden email]>,
>
> Thanks for taking interest in my post and question.
>
>  If the Evictor removes elements  _before_ the function is applied, then what happens the first time, the Evictor is acting? That's what I am failing to understand. At the beginning of the operation on the Stream, he Trigger finds 5 elements, the Evictor removes 2 of them (let's say), and Sum sees incorrect number of elements. Isn't this possible or again, am I demonstrating my loose grasp of the matter?
>
> Please correct me.
>
> -- N
>
>
> --
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is where they should be.
> Now put the foundation under them."

Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

nsengupta
In reply to this post by nsengupta
Hello Aljoscha <[hidden email]>,

Many thanks for taking time to explain the behaviour of Evictor. The essence of my original post - about how the guide explains an Evictor - was this. I think the guide should make this (counterintuitive) explanation of the parameter to Evictor clearer. May help others, yet uninitiated in the world of Flink! :-)

Because you have offered to clarify further, given the following code snippet:

.....
.trigger(CountTrigger.of(5))
      .evictor(CountEvictor.of(4))
         .maxBy(1)

my understanding (after reading your mail) is that if I am not careful about the parameters I pass to CountTrigger and CountEvictor, my function may not work correctly. In this case, when the window is filled with 5 events, Evictor removes the first event and leaves 4. Thus, the function never sees the 1st event.

Have I understood correctly? Will be happy to hear from you.

-- Nirmalya




--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of Trigger and Eviction on a window

Fabian Hueske-2
Yes, that is correct. The first element will be lost.

In fact, you do neither need a trigger nor an evictor if you want to get the max element for each group of 5 elements.
See my reply on your other mail.

Cheers,
Fabian

2015-11-30 18:47 GMT+01:00 Nirmalya Sengupta <[hidden email]>:
Hello Aljoscha <[hidden email]>,

Many thanks for taking time to explain the behaviour of Evictor. The essence of my original post - about how the guide explains an Evictor - was this. I think the guide should make this (counterintuitive) explanation of the parameter to Evictor clearer. May help others, yet uninitiated in the world of Flink! :-)

Because you have offered to clarify further, given the following code snippet:

.....
.trigger(CountTrigger.of(5))
      .evictor(CountEvictor.of(4))
         .maxBy(1)

my understanding (after reading your mail) is that if I am not careful about the parameters I pass to CountTrigger and CountEvictor, my function may not work correctly. In this case, when the window is filled with 5 events, Evictor removes the first event and leaves 4. Thus, the function never sees the 1st event.

Have I understood correctly? Will be happy to hear from you.

-- Nirmalya




--
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is where they should be.
Now put the foundation under them."