Does Flink support stream-stream outer joins in the latest version?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Does Flink support stream-stream outer joins in the latest version?

kant kodali
Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

Xingcan Cui
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

Hequn Cheng
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!


Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

kant kodali
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 

On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <[hidden email]> wrote:
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!



Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

Hequn Cheng
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream and the semantic is very like batch join.

Time-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
 
Non-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <[hidden email]> wrote:
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 

On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <[hidden email]> wrote:
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!




Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

kant kodali
Hi Cheng,

The docs here states full outer joins are only available for batch (I am not sure if I am reading that correctly). I am trying to understand how two unbounded streams can be joined like a batch? If we have to do batch join then it must be bounded right? If so, how do we bound? I can think Time Window is one way to bound but other than that if I execute the below join query on the unbounded stream I am not even sure how that works? A row from one table can join with a row from another table and that row can come anytime in future right if it is unbounded. so I am sorry I am failing to understand.


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

Thanks!

On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <[hidden email]> wrote:
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream and the semantic is very like batch join.

Time-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
 
Non-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <[hidden email]> wrote:
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 

On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <[hidden email]> wrote:
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!





Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

Xingcan Cui
Hi Kant,

the non windowed stream-stream join is not equivalent to the full-history join, though they get the same SQL form. The retention times for records must be set to leverage the storage consumption and completeness of the results.

Best,
Xingcan

On 7 Mar 2018, at 8:02 PM, kant kodali <[hidden email]> wrote:

Hi Cheng,

The docs here states full outer joins are only available for batch (I am not sure if I am reading that correctly). I am trying to understand how two unbounded streams can be joined like a batch? If we have to do batch join then it must be bounded right? If so, how do we bound? I can think Time Window is one way to bound but other than that if I execute the below join query on the unbounded stream I am not even sure how that works? A row from one table can join with a row from another table and that row can come anytime in future right if it is unbounded. so I am sorry I am failing to understand.


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

Thanks!

On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <[hidden email]> wrote:
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream and the semantic is very like batch join.

Time-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
 
Non-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <[hidden email]> wrote:
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 

On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <[hidden email]> wrote:
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!






Reply | Threaded
Open this post in threaded view
|

Re: Does Flink support stream-stream outer joins in the latest version?

Hequn Cheng
In reply to this post by kant kodali
Hi kant,

You are right. Batch joins require the inputs are bounded. To join two unbounded streams without window, all data will be stored in join's states, so the late right row will join the previous left row when it is input. 
As for state retention time, if the input tables of join are both keyed table and key number of the keyed tables are limited, then you don't have to set state retention time, otherwise it is suggested to set the state retention time.

Best, Hequn

On Wed, Mar 7, 2018 at 8:02 PM, kant kodali <[hidden email]> wrote:
Hi Cheng,

The docs here states full outer joins are only available for batch (I am not sure if I am reading that correctly). I am trying to understand how two unbounded streams can be joined like a batch? If we have to do batch join then it must be bounded right? If so, how do we bound? I can think Time Window is one way to bound but other than that if I execute the below join query on the unbounded stream I am not even sure how that works? A row from one table can join with a row from another table and that row can come anytime in future right if it is unbounded. so I am sorry I am failing to understand.


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

Thanks!

On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <[hidden email]> wrote:
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream and the semantic is very like batch join.

Time-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
 
Non-windowed Join:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <[hidden email]> wrote:
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 

On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <[hidden email]> wrote:
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 

Best, Hequn


On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <[hidden email]> wrote:
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan



On 7 Mar 2018, at 12:45 AM, kant kodali <[hidden email]> wrote:

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!