One large WindowFunction vs. several smaller ones

Posted by Maarten Hamal on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/One-large-WindowFunction-vs-several-smaller-ones-tp14856.html

Hi

I'm pretty new to Flink and stream processing in general but I'm writing a thesis about it and I have a small question regarding WindowFunctions (WF). I have one large WF that does essentially two things: counting how much events there are in a window and calculating the difference between this and the previous window. And in the end I need the output of both calculations.

I'm in doubt if I should use one large WF that does both in combination with a simple count-WF to get the result of the count
OR a small count-WF and a small diff-WF so that I can run the count-WF first and supply the result to both a sink and to the diff-WF.

Graphically these are the two options:
Inline afbeelding 1

If I think about it every option has positive and negative sides:
The blue option (Count-WF + Diff-WF):
  • + has more modularity and every WF does exactly one thing
  • - I need to do an extra KeyBy and Window operation for the Diff-WF with the same values as for the Count-WF (performance hit)
While the red option (Count-WF + Large WF):
  • + might have better performance
  • - has less modularity

So I would like the opinion of someone who has more experience with Flink that I have (so almost everybody here).

Best regards,
Maarten