Re: Get Tumbling Window Top-K using SQL
Posted by
Jark Wu-3 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Get-Tumbling-Window-Top-K-using-SQL-tp33236p33263.html
Hi Weizheng,
You are right. You can use the TopN feature in blink planner. But note that it doesn't support tumbling window topn, it is a topn without windowing and event-time.
But you can achieve it by PARTITIONED BY <time-window>, the <time-window> column could be a preprocessed column which represents which window does this row belongs to, e.g. 1-hour windowing: "2020-03-02 10:00", "2020-03-02 11:00".
The tumbling window topn will be natively supported in the future.
Best,
Jark
Sorry guys,
I find solution on wiki about Top-N using Blink planner.
SELECT [column_list]
FROM (
SELECT [column_list],
ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
FROM table_name)
WHERE rownum <= N [AND conditions]
thanks anyway.
Hi,
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c
FROM (
SELECT
a, b, c,
RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank
FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?
In the above wiki, I don't know why the query has a field 'lastLogin'
SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);
Thanks!