How can i merge more than one flink stream

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How can i merge more than one flink stream

Jone Zhang
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Fabian Hueske-2
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Kien Truong
Hi,

To expand on Fabian's answer, there's a few API for join.

* connect - you have to provide a CoprocessFunction.

* window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function.

With the first method, you have to write more code, in exchange for much more flexible join condition.

Regards,
Kien

On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Jone Zhang
Thanks for your reply. I have another question:
In my situation, each of the three streams contains a local timestamp segment. How can I ensure that their timestamps are consistent in each time window before the merging operation? And how to ensure the arrival of all the streams with consistent timestamps in each time window?

Thanks.

2017-07-20 13:39 GMT+08:00 Kien Truong <[hidden email]>:
Hi,

To expand on Fabian's answer, there's a few API for join.

* connect - you have to provide a CoprocessFunction.

* window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function.

With the first method, you have to write more code, in exchange for much more flexible join condition.

Regards,
Kien

On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Jörn Franke
What do you mean by "consistent"? 
Of course you can do this only at the time the timpstamp is defined (e.g. Using NTP). However, this is never perfect .
Then it is unrealistic that they always end up in the same window because of network delays etc. you will need here a global state that is defined based on your use case (why do you need this?)

On 25. Jul 2017, at 08:35, Jone Zhang <[hidden email]> wrote:

Thanks for your reply. I have another question:
In my situation, each of the three streams contains a local timestamp segment. How can I ensure that their timestamps are consistent in each time window before the merging operation? And how to ensure the arrival of all the streams with consistent timestamps in each time window?

Thanks.

2017-07-20 13:39 GMT+08:00 Kien Truong <[hidden email]>:
Hi,

To expand on Fabian's answer, there's a few API for join.

* connect - you have to provide a CoprocessFunction.

* window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function.

With the first method, you have to write more code, in exchange for much more flexible join condition.

Regards,
Kien

On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Jone Zhang
“Consistent” means that in the same time window, the timestamps of the three streams should be kept the same.

In my application, I am trying to build an online learning system. I need to join the streams from 1 and 2 on the SAME timestamp to form training samples which will be fed to some online learning algorithm. 

Thanks

2017-07-25 14:40 GMT+08:00 Jörn Franke <[hidden email]>:
What do you mean by "consistent"? 
Of course you can do this only at the time the timpstamp is defined (e.g. Using NTP). However, this is never perfect .
Then it is unrealistic that they always end up in the same window because of network delays etc. you will need here a global state that is defined based on your use case (why do you need this?)

On 25. Jul 2017, at 08:35, Jone Zhang <[hidden email]> wrote:

Thanks for your reply. I have another question:
In my situation, each of the three streams contains a local timestamp segment. How can I ensure that their timestamps are consistent in each time window before the merging operation? And how to ensure the arrival of all the streams with consistent timestamps in each time window?

Thanks.

2017-07-20 13:39 GMT+08:00 Kien Truong <[hidden email]>:
Hi,

To expand on Fabian's answer, there's a few API for join.

* connect - you have to provide a CoprocessFunction.

* window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function.

With the first method, you have to write more code, in exchange for much more flexible join condition.

Regards,
Kien

On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks



Reply | Threaded
Open this post in threaded view
|

Re: How can i merge more than one flink stream

Fabian Hueske-2
If you are using tumbling time-windows, then the timestamp of the aggregated records emitted from the window are all the maximum timestamp that would have been accepted for the window.
For example, if you have an hourly tumbling window, the window from 2 to 3 o'clock would include all timestamps between [14:00:00.000, 15:00:00.000), so the maximum timestamp that would be assigned to the window would be 14:59:59.000.
It does not matter which records were assigned to a window. The timestamp of every emitted record will be the maximum timestamp.

If you are using aligned windows, such as tumbling windows, on different streams, they will have consistent timestamps on which you can join.

Does this answer your question?

Best, Fabian

2017-07-25 11:59 GMT+02:00 Jone Zhang <[hidden email]>:
“Consistent” means that in the same time window, the timestamps of the three streams should be kept the same.

In my application, I am trying to build an online learning system. I need to join the streams from 1 and 2 on the SAME timestamp to form training samples which will be fed to some online learning algorithm. 

Thanks

2017-07-25 14:40 GMT+08:00 Jörn Franke <[hidden email]>:
What do you mean by "consistent"? 
Of course you can do this only at the time the timpstamp is defined (e.g. Using NTP). However, this is never perfect .
Then it is unrealistic that they always end up in the same window because of network delays etc. you will need here a global state that is defined based on your use case (why do you need this?)

On 25. Jul 2017, at 08:35, Jone Zhang <[hidden email]> wrote:

Thanks for your reply. I have another question:
In my situation, each of the three streams contains a local timestamp segment. How can I ensure that their timestamps are consistent in each time window before the merging operation? And how to ensure the arrival of all the streams with consistent timestamps in each time window?

Thanks.

2017-07-20 13:39 GMT+08:00 Kien Truong <[hidden email]>:
Hi,

To expand on Fabian's answer, there's a few API for join.

* connect - you have to provide a CoprocessFunction.

* window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function.

With the first method, you have to write more code, in exchange for much more flexible join condition.

Regards,
Kien

On Jul 20, 2017, at 01:55, Fabian Hueske <[hidden email]> wrote:
Hi,

there are basically two operations to merge streams.

1. Union simply merges the input streams such that the resulting stream has the records of all input streams. Union is a built-in operator in the DataStream API. For that all streams must have the same data type.
2. Join connects records of streams according to a join condition. When joining streams, this condition is often based on some time bounds. Join usually needs to be manually implemented using a stateful CoProcessFunction.

Once the streams are unioned or joined, you can apply a time-window on the result stream.

Best, Fabian

2017-07-19 9:05 GMT+02:00 Jone Zhang <[hidden email]>:
I have three data streams
1. app exposed and click
2. app download
3. app install

How can i merge the streams to create a unified stream,then compute it on time-based windows

Thanks