Hi
I'm pretty new to Flink and stream processing in general but I'm writing a thesis about it and I have a small question regarding WindowFunctions (WF). I have one large WF that does essentially two things: counting how much events there are in a window and calculating the difference between this and the previous window. And in the end I need the output of both calculations. I'm in doubt if I should use one large WF that does both in combination with a simple count-WF to get the result of the count OR a small count-WF and a small diff-WF so that I can run the count-WF first and supply the result to both a sink and to the diff-WF. Graphically these are the two options: If I think about it every option has positive and negative sides: The blue option (Count-WF + Diff-WF):
While the red option (Count-WF + Large WF):
So I would like the opinion of someone who has more experience with Flink that I have (so almost everybody here). Best regards, Maarten |
Hi Maarten,
If the Count-WF is counting the number of events per window and the Diff-WF is just comparing this number to the output of the previous window, then you do not need a WindowFunction for the Diff-WF afterall: Just use your Count-WF and plug in a stateful map (also see [1]) afterwards which stores the previous value, compares it, and emits (for example) a tuple with {key, count, diff}, or have them separate as you wish. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ state.html On Sunday, 13 August 2017 01:47:13 CEST Maarten Hamal wrote: > Hi > > I'm pretty new to Flink and stream processing in general but I'm writing a > thesis about it and I have a small question regarding WindowFunctions (WF). > I have one large WF that does essentially two things: counting how much > events there are in a window and calculating the difference between this > and the previous window. And in the end I need the output of both > calculations. > > I'm in doubt if I should use one large WF that does both in combination > with a simple count-WF to get the result of the count > OR a small count-WF and a small diff-WF so that I can run the count-WF > first and supply the result to both a sink and to the diff-WF. > > Graphically these are the two options: > [image: Inline afbeelding 1] > > If I think about it every option has positive and negative sides: > The blue option (Count-WF + Diff-WF): > > - + has more modularity and every WF does exactly one thing > - - I need to do an extra KeyBy and Window operation for the Diff-WF > with the same values as for the Count-WF (performance hit) > > While the red option (Count-WF + Large WF): > > - + might have better performance > - - has less modularity > > > So I would like the opinion of someone who has more experience with Flink > that I have (so almost everybody here). > > Best regards, > Maarten signature.asc (201 bytes) Download Attachment |
Free forum by Nabble | Edit this page |