[Survey] Demand collection for stream SQL window join

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Survey] Demand collection for stream SQL window join

Danny Chan
Hi, users, here i want to collect some use cases about the window join[1], which is a supported feature on the data stream. The purpose is to make a decision whether to support it also on the SQL side, for example, 2 tumbling window join may look like this:

```sql
select ... window_start, window_end
from TABLE(
  TUMBLE(
    DATA => TABLE table_a,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_a
    [LEFT | RIGHT | FULL OUTER] JOIN TABLE(
  TUMBLE(
    DATA => TABLE table_b,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_b
on tumble_a.col1 = tumble_b.col1 and ...
```

I had some discussion off-line with some companies (Tencent, Bytedance and Meituan), and it seems that interval join is the most common case. The window join case is very few, so i'm looking forward there are some feed-back here.

Expecially, it is apprecaited if you can share the use cases of the window join (using the Flink data stream or written by other programs) and why the window-join is a must(can not replace with normal stream join or interval join).

Thanks in advance ~

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/joining.html

Best,
Danny Chan
Reply | Threaded
Open this post in threaded view
|

Re: [Survey] Demand collection for stream SQL window join

Jark Wu-3
Thanks for the survey!

I'm also interested on the use cases of DataStream window join. 

Best,
Jark

On Thu, 27 Aug 2020 at 14:40, Danny Chan <[hidden email]> wrote:
Hi, users, here i want to collect some use cases about the window join[1], which is a supported feature on the data stream. The purpose is to make a decision whether to support it also on the SQL side, for example, 2 tumbling window join may look like this:

```sql
select ... window_start, window_end
from TABLE(
  TUMBLE(
    DATA => TABLE table_a,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_a
    [LEFT | RIGHT | FULL OUTER] JOIN TABLE(
  TUMBLE(
    DATA => TABLE table_b,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_b
on tumble_a.col1 = tumble_b.col1 and ...
```

I had some discussion off-line with some companies (Tencent, Bytedance and Meituan), and it seems that interval join is the most common case. The window join case is very few, so i'm looking forward there are some feed-back here.

Expecially, it is apprecaited if you can share the use cases of the window join (using the Flink data stream or written by other programs) and why the window-join is a must(can not replace with normal stream join or interval join).

Thanks in advance ~

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/joining.html

Best,
Danny Chan