Get Tumbling Window Top-K using SQL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Get Tumbling Window Top-K using SQL

Lu Weizheng
Hi,

I find a question on StackOverflow(https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by) about how to get Top-K using Flink SQL, it was written by Fabian. It was backed in 2018.
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c 
FROM (
  SELECT 
    a, b, c, 
    RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank 
  FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?


In the above wiki, I don't know why the query has a field 'lastLogin'

SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);

Thanks!
Reply | Threaded
Open this post in threaded view
|

回复: Get Tumbling Window Top-K using SQL

Lu Weizheng
Sorry guys,

I find solution on wiki about Top-N using Blink planner.

SELECT [column_list]
FROM (
   SELECT [column_list],
     ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
       ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
   FROM table_name)
WHERE rownum <= N [AND conditions]

thanks anyway.

发件人: Lu Weizheng <[hidden email]>
发送时间: 2020年3月1日 17:48
收件人: [hidden email] <[hidden email]>
主题: Get Tumbling Window Top-K using SQL
 
Hi,

I find a question on StackOverflow(https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by) about how to get Top-K using Flink SQL, it was written by Fabian. It was backed in 2018.
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c 
FROM (
  SELECT 
    a, b, c, 
    RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank 
  FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?


In the above wiki, I don't know why the query has a field 'lastLogin'

SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Get Tumbling Window Top-K using SQL

Jark Wu-3
Hi Weizheng,

You are right. You can use the TopN feature in blink planner. But note that it doesn't support tumbling window topn, it is a topn without windowing and event-time. 
But you can achieve it by PARTITIONED BY <time-window>, the <time-window> column could be a preprocessed column which represents which window does this row belongs to, e.g. 1-hour windowing: "2020-03-02 10:00", "2020-03-02 11:00".

The tumbling window topn will be natively supported in the future. 

Best,
Jark

On Mon, 2 Mar 2020 at 10:55, Lu Weizheng <[hidden email]> wrote:
Sorry guys,

I find solution on wiki about Top-N using Blink planner.

SELECT [column_list]
FROM (
   SELECT [column_list],
     ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
       ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
   FROM table_name)
WHERE rownum <= N [AND conditions]

thanks anyway.

发件人: Lu Weizheng <[hidden email]>
发送时间: 2020年3月1日 17:48
收件人: [hidden email] <[hidden email]>
主题: Get Tumbling Window Top-K using SQL
 
Hi,

I find a question on StackOverflow(https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by) about how to get Top-K using Flink SQL, it was written by Fabian. It was backed in 2018.
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c 
FROM (
  SELECT 
    a, b, c, 
    RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank 
  FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?


In the above wiki, I don't know why the query has a field 'lastLogin'

SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
  SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);

Thanks!