Posted by
ba on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Stream-Iterative-Matching-tp35304.html
Hi All,
I'm new to Flink but am trying to write an application that processes data
from internet connected sensors.
My problem is as follows:
-Data arrives in the format: [sensor id] [timestamp in seconds] [sensor
value]
-Data can arrive out of order (between sensor IDs) by upto 5 minutes.
-So a stream of data could be:
[1] [100] [20]
[2] [101] [23]
[1] [105] [31]
[1] [140] [17]
-Each sensor can sometimes split its measurements, and I'm hoping to 'put
them back together' within Flink. For data to be 'put back together' it must
have a timestamp within 90 seconds of the timestamp on the first piece of
data. The data must also be put back together in order, in the example above
for sensor 1 you could only have combinations of (A) the first reading on
its own (time 100), (B) the first and third item (time 100 and 105) or (C)
the first, third and fourth item (time 100, 105, 140). The second item is a
different sensor so not considered in this exercise.
-I would like to write something that tries different 'sum' combinations
within the 90 second limit and outputs the best 'match' to expected values.
In the example above lets say the expected sum values are 50 or 100. Of the
three combinations I mentioned for sensor 1, the sum would be 20, 51, or 68.
Therefore the 'best' combination is 51 as it is closest to 50 or 100, so it
would output two data items: [1] [100] [20] and [1] [105] [31], with the
last item left in the stream and matched with any other data points that
arrive after.
I am thinking some sort of iterative function that does this, but am not
sure how to emit the values I want and keep other values that were
considered (but not used) in the stream.
Any ideas or help is really appreciated?
Thanks,
Marc
--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/