I have three data streams 1. app exposed and click 2. app download 3. app install How can i merge the streams to create a unified stream,then compute it on time-based windows Thanks |
Hi, there are basically two operations to merge streams. 1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type. 2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction. Once the streams are unioned or joined, you can apply a time-window on the result stream. Best, Fabian 2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
|
Hi,
To expand on Fabian's answer, there's a few API for join.
* connect - you have to provide a CoprocessFunction.
* window join/cogroup - you provide key selector functions, a time window and a join/cogroup function.
With the first method, you have to write more code, in exchange for much more flexible join condition.
Regards,
Kien
On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
|
Thanks for your reply. I have another question: In my situation, each of the three streams contains a local timestamp segment. How can I ensure that their timestamps are consistent in each time window before the merging operation? And how to ensure the arrival of all the streams with consistent timestamps in each time window? Thanks. 2017-07-20 13:39 GMT+08:00 Kien Truong <[hidden email]>:
|
What do you mean by "consistent"? Of course you can do this only at the time the timpstamp is defined (e.g. Using NTP). However, this is never perfect . Then it is unrealistic that they always end up in the same window because of network delays etc. you will need here a global state that is defined based on your use case (why do you need this?)
|
“Consistent” means that in the same time window, the timestamps of the three streams should be kept the same. In my application, I am trying to build an online learning system. I need to join the streams from 1 and 2 on the SAME timestamp to form training samples which will be fed to some online learning algorithm. Thanks 2017-07-25 14:40 GMT+08:00 Jörn Franke <[hidden email]>:
|
If you are using tumbling time-windows, then the timestamp of the aggregated records emitted from the window are all the maximum timestamp that would have been accepted for the window. For example, if you have an hourly tumbling window, the window from 2 to 3 o'clock would include all timestamps between [14:00:00.000, 15:00:00.000), so the maximum timestamp that would be assigned to the window would be 14:59:59.000. 2017-07-25 11:59 GMT+02:00 Jone Zhang <[hidden email]>:
|
Free forum by Nabble | Edit this page |