Hello All, Hope you are doing well..
Myself Samim and I am working of POC(proof of concept) for a project. In this project we are using Apache Flink to process the stream data and find the required pattern and finally dump those patterns in DB.
So to implement this we have used the global window and customized trigger to done our work. While testing we observed that output is coming as expected but we are loosing the data for few minutes when the Stream ends at input.
For example If the data streaming stared at 1pm and it ends at 5pm on the same day and in out put we found the data is missing for the time 4:55pm to 5 pm. Also we observed when the input data stream finishes immediately the entire process stops and the last few minutes data are remains inside the window.
We need your help here to overcome this last minutes data missing issue as I am new to this flink framework. Do we have any API available to solve this problem or it is the Flink limitation?
It’ll be great if you share your views and do let me know if you need any further information.
I am waiting for your inputs, Thanks in advance.
Thanks, Samim. |
Hi,
If your could give us a look at your custom Trigger we might be able to figure out what’s going on. Best, Aljoscha
|
Hi,
I think the problem is that the Trigger only uses processing time to determine when to trigger. If the job shuts down (which happens when the sources shut down and the whole pipeline is flushed) pending processing-time triggers are not fired. You can use the fact that when sources shutdown they emit a Long.MAX_VALUE watermark to signal that there won’t be any data in the future. For this, you have to enable watermarks (you can do this via env.setStreamTimeCharacteristic(EventTime)) and then set an event-time timer in your Trigger for Long.MAX_VALUE. This will call your onEventTime() method and allow the Trigger to fire before shutting down. Best, Aljoscha
|
Free forum by Nabble | Edit this page |