GROUP BY TUMBLE on ROW range

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

GROUP BY TUMBLE on ROW range

Stefano Bortoli-3

Hi all,

 

Is there a way to use a tumble window group by with row range in streamSQL?

 

I mean, something like this:

 

//      "SELECT COUNT(*) " +

//             "FROM T1 " +

//        "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"

 

However, even looking at tests and looking at the “row interval expression generation” I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow is the chosen abstraction.

 

    table

      .window(Tumble over 2.rows on 'long as 'w)

      .groupBy('w)

      .select('int.count)

      .toDataSet[Row]

 

I fear I am missing something simple. Thanks a lot for the support guys!

 

Best,

Stefano

Reply | Threaded
Open this post in threaded view
|

Re: GROUP BY TUMBLE on ROW range

Fabian Hueske-2
Hi Stefano,

this is not supported in Flink's SQL and we would need new Group Window functions (like TUMBLE) for this.
A TUMBLE_COUNT function would be somewhat similar to SESSION, which also requires checks on the sorted neighboring rows to identify the window of a row.
Such a function would first need to be added to Calcite and then integrated with Flink.

A tumble count could also be expressed in plain SQL but wouldn't be very intuitive. You would have to
- define an over window (maybe partitioned on some key) sorted on time with a ROW_NUMBER function that assigns increasing numbers to rows.
- do a group by on the row number modulo the window size.

Btw. count windows are supported by the Table API.

Best, Fabian



2017-10-17 17:16 GMT+02:00 Stefano Bortoli <[hidden email]>:

Hi all,

 

Is there a way to use a tumble window group by with row range in streamSQL?

 

I mean, something like this:

 

//      "SELECT COUNT(*) " +

//             "FROM T1 " +

//        "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"

 

However, even looking at tests and looking at the “row interval expression generation” I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow is the chosen abstraction.

 

    table

      .window(Tumble over 2.rows on 'long as 'w)

      .groupBy('w)

      .select('int.count)

      .toDataSet[Row]

 

I fear I am missing something simple. Thanks a lot for the support guys!

 

Best,

Stefano


Reply | Threaded
Open this post in threaded view
|

RE: GROUP BY TUMBLE on ROW range

Stefano Bortoli-3

Great, thanks for the explanation. I noticed now indeed that the examples are for the table API. I believe over window is sufficient for the purpose right now, was just curious.

 

Best,

Stefano

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Tuesday, October 17, 2017 9:24 PM
To: Stefano Bortoli <[hidden email]>
Cc: [hidden email]
Subject: Re: GROUP BY TUMBLE on ROW range

 

Hi Stefano,

this is not supported in Flink's SQL and we would need new Group Window functions (like TUMBLE) for this.

A TUMBLE_COUNT function would be somewhat similar to SESSION, which also requires checks on the sorted neighboring rows to identify the window of a row.

Such a function would first need to be added to Calcite and then integrated with Flink.

 

A tumble count could also be expressed in plain SQL but wouldn't be very intuitive. You would have to

- define an over window (maybe partitioned on some key) sorted on time with a ROW_NUMBER function that assigns increasing numbers to rows.

- do a group by on the row number modulo the window size.

Btw. count windows are supported by the Table API.

Best, Fabian

 

 

2017-10-17 17:16 GMT+02:00 Stefano Bortoli <[hidden email]>:

Hi all,

Is there a way to use a tumble window group by with row range in streamSQL?

I mean, something like this:

//      "SELECT COUNT(*) " +

//             "FROM T1 " +

//        "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"

 

However, even looking at tests and looking at the “row interval expression generation” I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow is the chosen abstraction.

 

    table

      .window(Tumble over 2.rows on 'long as 'w)

      .groupBy('w)

      .select('int.count)

      .toDataSet[Row]

 

I fear I am missing something simple. Thanks a lot for the support guys!

 

Best,

Stefano