One large WindowFunction vs. several smaller ones

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

One large WindowFunction vs. several smaller ones

Maarten Hamal
Hi

I'm pretty new to Flink and stream processing in general but I'm writing a thesis about it and I have a small question regarding WindowFunctions (WF). I have one large WF that does essentially two things: counting how much events there are in a window and calculating the difference between this and the previous window. And in the end I need the output of both calculations.

I'm in doubt if I should use one large WF that does both in combination with a simple count-WF to get the result of the count
OR a small count-WF and a small diff-WF so that I can run the count-WF first and supply the result to both a sink and to the diff-WF.

Graphically these are the two options:
Inline afbeelding 1

If I think about it every option has positive and negative sides:
The blue option (Count-WF + Diff-WF):
  • + has more modularity and every WF does exactly one thing
  • - I need to do an extra KeyBy and Window operation for the Diff-WF with the same values as for the Count-WF (performance hit)
While the red option (Count-WF + Large WF):
  • + might have better performance
  • - has less modularity

So I would like the opinion of someone who has more experience with Flink that I have (so almost everybody here).

Best regards,
Maarten
Reply | Threaded
Open this post in threaded view
|

Re: One large WindowFunction vs. several smaller ones

Nico Kruber
Hi Maarten,
If the Count-WF is counting the number of events per window and the Diff-WF is
just comparing this number to the output of the previous window, then you do
not need a WindowFunction for the Diff-WF afterall:

Just use your Count-WF and plug in a stateful map (also see [1]) afterwards
which stores the previous value, compares it, and emits (for example) a tuple
with {key, count, diff}, or have them separate as you wish.


Nico

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/
state.html
On Sunday, 13 August 2017 01:47:13 CEST Maarten Hamal wrote:

> Hi
>
> I'm pretty new to Flink and stream processing in general but I'm writing a
> thesis about it and I have a small question regarding WindowFunctions (WF).
> I have one large WF that does essentially two things: counting how much
> events there are in a window and calculating the difference between this
> and the previous window. And in the end I need the output of both
> calculations.
>
> I'm in doubt if I should use one large WF that does both in combination
> with a simple count-WF to get the result of the count
> OR a small count-WF and a small diff-WF so that I can run the count-WF
> first and supply the result to both a sink and to the diff-WF.
>
> Graphically these are the two options:
> [image: Inline afbeelding 1]
>
> If I think about it every option has positive and negative sides:
> The blue option (Count-WF + Diff-WF):
>
>    - + has more modularity and every WF does exactly one thing
>    - - I need to do an extra KeyBy and Window operation for the Diff-WF
>    with the same values as for the Count-WF (performance hit)
>
> While the red option (Count-WF + Large WF):
>
>    - + might have better performance
>    - - has less modularity
>
>
> So I would like the opinion of someone who has more experience with Flink
> that I have (so almost everybody here).
>
> Best regards,
> Maarten


signature.asc (201 bytes) Download Attachment