batch range sort support

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

batch range sort support

Benchao Li
Hi,

Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data.

And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. 
After enabling this config, I found that it's not implemented completely now. This config changes the distribution
 from SINGLETON to range for sort operator, however in BatchExecExchange we do not deal with range 
distribution, and will throw UnsupportedOperationException.

My question is,
1. Is this config just a mistake when we merge blink into flink, and we actually didn't plan to implement this?
2. If this is in the plan, then which version may we expect it to be ready?


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: batch range sort support

Jingsong Li
Hi, Benchao,

Glad to see your requirement about range partition.
I have a branch to support range partition: [1]

Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range partition.


Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 12:15 PM Benchao Li <[hidden email]> wrote:
Hi,

Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data.

And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. 
After enabling this config, I found that it's not implemented completely now. This config changes the distribution
 from SINGLETON to range for sort operator, however in BatchExecExchange we do not deal with range 
distribution, and will throw UnsupportedOperationException.

My question is,
1. Is this config just a mistake when we merge blink into flink, and we actually didn't plan to implement this?
2. If this is in the plan, then which version may we expect it to be ready?


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: batch range sort support

Benchao Li
Hi Jingsong,

Thanks for your quick response. I've CC'ed Chongchen who understands the scenario much better.


Jingsong Li <[hidden email]> 于2020年4月23日周四 下午12:34写道:
Hi, Benchao,

Glad to see your requirement about range partition.
I have a branch to support range partition: [1]

Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range partition.


Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 12:15 PM Benchao Li <[hidden email]> wrote:
Hi,

Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data.

And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. 
After enabling this config, I found that it's not implemented completely now. This config changes the distribution
 from SINGLETON to range for sort operator, however in BatchExecExchange we do not deal with range 
distribution, and will throw UnsupportedOperationException.

My question is,
1. Is this config just a mistake when we merge blink into flink, and we actually didn't plan to implement this?
2. If this is in the plan, then which version may we expect it to be ready?


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: batch range sort support

Kurt Young
Hi Benchao, you can create a jira issue to track this.

Best,
Kurt


On Thu, Apr 23, 2020 at 2:27 PM Benchao Li <[hidden email]> wrote:
Hi Jingsong,

Thanks for your quick response. I've CC'ed Chongchen who understands the scenario much better.


Jingsong Li <[hidden email]> 于2020年4月23日周四 下午12:34写道:
Hi, Benchao,

Glad to see your requirement about range partition.
I have a branch to support range partition: [1]

Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range partition.


Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 12:15 PM Benchao Li <[hidden email]> wrote:
Hi,

Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data.

And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. 
After enabling this config, I found that it's not implemented completely now. This config changes the distribution
 from SINGLETON to range for sort operator, however in BatchExecExchange we do not deal with range 
distribution, and will throw UnsupportedOperationException.

My question is,
1. Is this config just a mistake when we merge blink into flink, and we actually didn't plan to implement this?
2. If this is in the plan, then which version may we expect it to be ready?


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: batch range sort support

Benchao Li
Hi Kurt,

I've created a jira issue[1] to track this, we can move further discussions to the jira issue.


Kurt Young <[hidden email]> 于2020年4月23日周四 下午10:25写道:
Hi Benchao, you can create a jira issue to track this.

Best,
Kurt


On Thu, Apr 23, 2020 at 2:27 PM Benchao Li <[hidden email]> wrote:
Hi Jingsong,

Thanks for your quick response. I've CC'ed Chongchen who understands the scenario much better.


Jingsong Li <[hidden email]> 于2020年4月23日周四 下午12:34写道:
Hi, Benchao,

Glad to see your requirement about range partition.
I have a branch to support range partition: [1]

Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range partition.


Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 12:15 PM Benchao Li <[hidden email]> wrote:
Hi,

Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data.

And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. 
After enabling this config, I found that it's not implemented completely now. This config changes the distribution
 from SINGLETON to range for sort operator, however in BatchExecExchange we do not deal with range 
distribution, and will throw UnsupportedOperationException.

My question is,
1. Is this config just a mistake when we merge blink into flink, and we actually didn't plan to implement this?
2. If this is in the plan, then which version may we expect it to be ready?


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Best, Jingsong Lee


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]