Hi all,
I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to execute my sql which looks like "insert overtwrite ... select ...". But I find the parallelism of sink is always 1, it's intolerable for large data. Why it happens? Otherwise, Is there any guide to decide the memory of taskmanager when I have two huge table to hashjoin, for example, each table has several TB data? Thanks, Faaron |
Hi faaron, For sink parallelism. - What is parallelism of the input of sink? The sink parallelism should be same. - Does you sql have order by or limit ? Flink batch sql not support range partition now, so it will use single parallelism to run order by. For the memory of taskmanager. There is manage memory option to configure. Best, Jingsong Lee On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <[hidden email]> wrote:
Best, Jingsong Lee |
Thanks for you attention. The input of sink is 500, and there is no order by and limit. Jingsong Li <[hidden email]> 于 2020年3月6日周五 下午6:15写道:
|
Which sink do you use? It depends on sink implementation like [1] Best, Jingsong Lee On Fri, Mar 6, 2020 at 6:37 PM faaron zheng <[hidden email]> wrote:
Best, Jingsong Lee |
Free forum by Nabble | Edit this page |