Preserve record orders after WINDOW function

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Preserve record orders after WINDOW function

Jiahui Jiang
Hello! I'm writing a SQL query with a OVER window function ordered by processing time.

I'm wondering since timestamp is only millisecond granularity.

For a query using over window and sorted on processing time column, for example,
```
SELECT col1, 
     max(col2) OVER (PARTITION BY col1, ORDER BY _processing_time_column)
FROM table
```

If 2 records have the same processing timestamp (they arrived the operator at the exact same millisecond), is the order guaranteed to be preserved after the window function? If not, what would to the recommended workaround to keep the order?

Thank you!
Jiahui
Reply | Threaded
Open this post in threaded view
|

Re: Preserve record orders after WINDOW function

Jark Wu-3
Hi Jiahui,

Yes, if they arrive at the same millisecond, they are perserved in the arriving order.

Best,
Jark



On Mon, 11 May 2020 at 23:17, Jiahui Jiang <[hidden email]> wrote:
Hello! I'm writing a SQL query with a OVER window function ordered by processing time.

I'm wondering since timestamp is only millisecond granularity.

For a query using over window and sorted on processing time column, for example,
```
SELECT col1, 
     max(col2) OVER (PARTITION BY col1, ORDER BY _processing_time_column)
FROM table
```

If 2 records have the same processing timestamp (they arrived the operator at the exact same millisecond), is the order guaranteed to be preserved after the window function? If not, what would to the recommended workaround to keep the order?

Thank you!
Jiahui
Reply | Threaded
Open this post in threaded view
|

Re: Preserve record orders after WINDOW function

Jiahui Jiang
Thank you for confirming!

Just want to make sure my understanding of the internal implementation is correct:

When applying an over window and ordered by processing time using SQL, the datastream plan it translates into doesn't actually have an order by logic. It just sequentially process all the elements that fall into this window in their order inside the stream. Is that correct?

Thanks again! 😊

From: Jark Wu <[hidden email]>
Sent: Monday, May 11, 2020 8:52 PM
To: Jiahui Jiang <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Preserve record orders after WINDOW function
 
Hi Jiahui,

Yes, if they arrive at the same millisecond, they are perserved in the arriving order.

Best,
Jark



On Mon, 11 May 2020 at 23:17, Jiahui Jiang <[hidden email]> wrote:
Hello! I'm writing a SQL query with a OVER window function ordered by processing time.

I'm wondering since timestamp is only millisecond granularity.

For a query using over window and sorted on processing time column, for example,
```
SELECT col1, 
     max(col2) OVER (PARTITION BY col1, ORDER BY _processing_time_column)
FROM table
```

If 2 records have the same processing timestamp (they arrived the operator at the exact same millisecond), is the order guaranteed to be preserved after the window function? If not, what would to the recommended workaround to keep the order?

Thank you!
Jiahui
Reply | Threaded
Open this post in threaded view
|

Re: Preserve record orders after WINDOW function

Jark Wu-3
Yes, that's right. 

On Tue, 12 May 2020 at 10:55, Jiahui Jiang <[hidden email]> wrote:
Thank you for confirming!

Just want to make sure my understanding of the internal implementation is correct:

When applying an over window and ordered by processing time using SQL, the datastream plan it translates into doesn't actually have an order by logic. It just sequentially process all the elements that fall into this window in their order inside the stream. Is that correct?

Thanks again! 😊

From: Jark Wu <[hidden email]>
Sent: Monday, May 11, 2020 8:52 PM
To: Jiahui Jiang <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Preserve record orders after WINDOW function
 
Hi Jiahui,

Yes, if they arrive at the same millisecond, they are perserved in the arriving order.

Best,
Jark



On Mon, 11 May 2020 at 23:17, Jiahui Jiang <[hidden email]> wrote:
Hello! I'm writing a SQL query with a OVER window function ordered by processing time.

I'm wondering since timestamp is only millisecond granularity.

For a query using over window and sorted on processing time column, for example,
```
SELECT col1, 
     max(col2) OVER (PARTITION BY col1, ORDER BY _processing_time_column)
FROM table
```

If 2 records have the same processing timestamp (they arrived the operator at the exact same millisecond), is the order guaranteed to be preserved after the window function? If not, what would to the recommended workaround to keep the order?

Thank you!
Jiahui