Hi All,
I'm new to Flink but am trying to write an application that processes data from internet connected sensors. My problem is as follows: -Data arrives in the format: [sensor id] [timestamp in seconds] [sensor value] -Data can arrive out of order (between sensor IDs) by upto 5 minutes. -So a stream of data could be: [1] [100] [20] [2] [101] [23] [1] [105] [31] [1] [140] [17] -Each sensor can sometimes split its measurements, and I'm hoping to 'put them back together' within Flink. For data to be 'put back together' it must have a timestamp within 90 seconds of the timestamp on the first piece of data. The data must also be put back together in order, in the example above for sensor 1 you could only have combinations of (A) the first reading on its own (time 100), (B) the first and third item (time 100 and 105) or (C) the first, third and fourth item (time 100, 105, 140). The second item is a different sensor so not considered in this exercise. -I would like to write something that tries different 'sum' combinations within the 90 second limit and outputs the best 'match' to expected values. In the example above lets say the expected sum values are 50 or 100. Of the three combinations I mentioned for sensor 1, the sum would be 20, 51, or 68. Therefore the 'best' combination is 51 as it is closest to 50 or 100, so it would output two data items: [1] [100] [20] and [1] [105] [31], with the last item left in the stream and matched with any other data points that arrive after. I am thinking some sort of iterative function that does this, but am not sure how to emit the values I want and keep other values that were considered (but not used) in the stream. Any ideas or help is really appreciated? Thanks, Marc -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi, Marc I think the window operator might fulfill your needs. You could find the detailed description here[1] In general, I think you could choose the correct type of window and use the `ProcessWindowFunction` to emit the elements that match the best sum. ba <[hidden email]> 于2020年5月20日周三 下午9:58写道: Hi All, |
Hi Guowei,
Thank you for your reply. Are you able to give some detail on how that would work with the per window state you linked? I'm struggling to see how the logic would work. I guess something like a session window on a keyed stream (keyed by sensor ID). Timers would fire 90 seconds after each element is added to the window and then be evaluated? I can't quite think how this would work in practise or how to handle the case where timers fire for data that has already been ejected from the window (as it has been matched with past data)? If there are any examples showing similar uses of this function that would be great? Any assistance is very appreciated! Best, Marc -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi, Marc 1. I think you should choose which type of window you want to use first. (Thumbling/Sliding/Session) From your description, I think the session window maybe not suit your case because there is no gap. 2. >>> how this would work in practise or how to handle the case where timers fire for data that has already been ejected from the window (as it has been matched with past data)? Do you want to know the lifecycle of the element in the window? I think you could know that the lifecycle of the window and element in it after you choose your window type. For example, the element could be assigned to multiple slide windows and an element ejected from a sliding window could be processed from another sliding window.[1] 3. I think you could find some examples in the `WindowTranslationTest`. 4. If these window types do not work for your application. I think you might need a customized window(trigger/evictor). However, I think you could make a simple POC with the current type window first. Best, Guowei ba <[hidden email]> 于2020年5月21日周四 下午4:00写道: Hi Guowei, |
Free forum by Nabble | Edit this page |