orry for the late reply. You're right that with the windowed join you would have to deal with pairs where the timestamp of (x,y) is not necessarily earlier than the timestamp of z. Moreover, by using sliding windows you would receive duplicates as you've described. Using tumbling windows would mean that you lose join matches if (x,y) lives in an earlier window. Thus, in order to solve your problem you would have to write a custom stream operator.The stream operator would do the following: Collecting the inputs from (x,y) and z which are already keyed. Thus, we know that x=z holds true. Using a priority queue we order the elements because we don't know how the arrive at the operator. Whenever we receive a watermark indicating that no earlier events can arrive anymore, we can go through the two priority queues to join the elements. The queues are part of the operators state so that we don't lose information in case of a recovery.I've sketched such an operator here [1]. I hope this helps you to get started.
Free forum by Nabble | Edit this page |