Hi Folks: I am going through flink documentation and it states the following: "You should be aware that the elements emitted by a late firing should be treated as updated results of a previous computation, i.e., your data stream will contain multiple results for the same computation. Depending on your application, you need to take these duplicated results into account or deduplicate them." I wanted to find out the following: 1. How do we distinguish the late firing from the main firing ? 2. Does the late firing including all events or only late events ? 3. How does the late vs main firing affect the associated window function ? 4. Are there any examples of how to handle these events and deduplication mentioned in the documentation ? Thanks for your help. Mans |
Hi,
1. You could use a ProcessWindowFunction instead of a WindowFunction. In there, you can query the current watermark and thus determine why the firing is happening. Also, in a ProcessWindowFunction you can keep per-window state, this would allow you to keep a bit of state that can tell you whether this is the first firing for a given window or the number of firings so far. 2. This depends on whether the Trigger is purging or not. The default EventTimeTrigger is not purging, meaning that all elements in the window will be preserved after firing (until the watermark reaches the end of the window plus the allowed lateness). You can turn this into a purging trigger using PurgingTrigger.of(EventTimeTrigger.create()). You would specify this using .trigger() on WindowedStream when constructing your windowed operation. 3. It doesn't, you have to manually keep state in a ProcessWindowFunction to distinguish between different cases, as mentioned above. 4. Currently, I think there are no examples because this depends to a large degree on the specifics of the application. I'm afraid. Best, Aljoscha
|
Thanks Aljoscha for your response. Just to clarify - the only way to handle the duplication scenario properly is by using the ProcessWindowFunction - there is no high level function for this. Thanks again. On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek <[hidden email]> wrote: Hi, 1. You could use a ProcessWindowFunction instead of a WindowFunction. In there, you can query the current watermark and thus determine why the firing is happening. Also, in a ProcessWindowFunction you can keep per-window state, this would allow you to keep a bit of state that can tell you whether this is the first firing for a given window or the number of firings so far. 2. This depends on whether the Trigger is purging or not. The default EventTimeTrigger is not purging, meaning that all elements in the window will be preserved after firing (until the watermark reaches the end of the window plus the allowed lateness). You can turn this into a purging trigger using PurgingTrigger.of(EventTimeTrigger.create()). You would specify this using .trigger() on WindowedStream when constructing your windowed operation. 3. It doesn't, you have to manually keep state in a ProcessWindowFunction to distinguish between different cases, as mentioned above. 4. Currently, I think there are no examples because this depends to a large degree on the specifics of the application. I'm afraid. Best, Aljoscha
|
I would say so, yes. But I don't consider ProessWindowFunction to be low-level, it's just the function that should be used for processing windows if you need more information about context.
Best, Aljoscha
|
Free forum by Nabble | Edit this page |