The parallelism of sink is always 1 in sqlUpdate

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The parallelism of sink is always 1 in sqlUpdate

faaron zheng
Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of taskmanager when I have two huge table to hashjoin, for example, each table has several TB data?

Thanks,
Faaron
Reply | Threaded
Open this post in threaded view
|

Re: The parallelism of sink is always 1 in sqlUpdate

Jingsong Li
Hi faaron,

For sink parallelism.
- What is parallelism of the input of sink? The sink parallelism should be same.
- Does you sql have order by or limit ?
Flink batch sql not support range partition now, so it will use single parallelism to run order by.

For the memory of taskmanager.
There is manage memory option to configure.


Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <[hidden email]> wrote:
Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of taskmanager when I have two huge table to hashjoin, for example, each table has several TB data?

Thanks,
Faaron


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: The parallelism of sink is always 1 in sqlUpdate

faaron zheng
Thanks for you attention.  The input of sink is 500, and there is no order by and limit.

Jingsong Li <[hidden email]> 于 2020年3月6日周五 下午6:15写道:
Hi faaron,

For sink parallelism.
- What is parallelism of the input of sink? The sink parallelism should be same.
- Does you sql have order by or limit ?
Flink batch sql not support range partition now, so it will use single parallelism to run order by.

For the memory of taskmanager.
There is manage memory option to configure.


Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <[hidden email]> wrote:
Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of taskmanager when I have two huge table to hashjoin, for example, each table has several TB data?

Thanks,
Faaron


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: The parallelism of sink is always 1 in sqlUpdate

Jingsong Li

On Fri, Mar 6, 2020 at 6:37 PM faaron zheng <[hidden email]> wrote:
Thanks for you attention.  The input of sink is 500, and there is no order by and limit.

Jingsong Li <[hidden email]> 于 2020年3月6日周五 下午6:15写道:
Hi faaron,

For sink parallelism.
- What is parallelism of the input of sink? The sink parallelism should be same.
- Does you sql have order by or limit ?
Flink batch sql not support range partition now, so it will use single parallelism to run order by.

For the memory of taskmanager.
There is manage memory option to configure.


Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <[hidden email]> wrote:
Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of taskmanager when I have two huge table to hashjoin, for example, each table has several TB data?

Thanks,
Faaron


--
Best, Jingsong Lee


--
Best, Jingsong Lee