improve the performance of flink sql job which lookup 40+ table

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

improve the performance of flink sql job which lookup 40+ table

snack white
HI:
      My  flink  version is 1.10  use per-job mode , my   sql like 

```
select
  column1, t2.xx2, t3.xx3,t4.xx4
  …  t40.xx40
from
  main_table 
  left join lookup_1 FOR SYSTEM_TIME AS OF t1.proc_time AS t2 on t1.xx= t2.xx
  left join lookup_2 FOR SYSTEM_TIME AS OF t1.proc_time AS t3 on t1.xx= t3.xx
  left join lookup_3 FOR SYSTEM_TIME AS OF t1.proc_time AS t4 on t1.xx= t4.xx
  left join lookup_4 FOR SYSTEM_TIME AS OF t1.proc_time AS t5 on t1.xx= t5.xx
  left join lookup_5 FOR SYSTEM_TIME AS OF t1.proc_time AS t6 on t1.xx= t6.xx
  left join lookup_6 FOR SYSTEM_TIME AS OF t1.proc_time AS t7 on t1.xx= t7.xx
...

  left join lookup_40 FOR SYSTEM_TIME AS OF t1.proc_time AS t40 on t1.xx= t40.xx
```

I have developed the async lookup feature , but that is not enough, maybe the current look up table is serial not parallelism ?  

Now I need help about how can I  improve the performance of my sql job .

Best 
White 


Reply | Threaded
Open this post in threaded view
|

Re: improve the performance of flink sql job which lookup 40+ table

Jark Wu-3
Hi,

Yes, currently, multiple lookup join is not parallel and execute one by one. 
Async lookup + cache is the suggested way to improve performance. 
If the lookup tables are not large, you can also implement a ALL cache for the LookupTableSource to cache all the data in the database, and reload periodically. 

In Flink 1.12, we will support temporal join changelog [1] which will join the changelog stream instead of lookup database,
 this will greatly improve the performance and have lowest latency. 

Best,
Jark





On Tue, 28 Jul 2020 at 11:43, snack white <[hidden email]> wrote:
HI:
      My  flink  version is 1.10  use per-job mode , my   sql like 

```
select
  column1, t2.xx2, t3.xx3,t4.xx4
  …  t40.xx40
from
  main_table 
  left join lookup_1 FOR SYSTEM_TIME AS OF t1.proc_time AS t2 on t1.xx= t2.xx
  left join lookup_2 FOR SYSTEM_TIME AS OF t1.proc_time AS t3 on t1.xx= t3.xx
  left join lookup_3 FOR SYSTEM_TIME AS OF t1.proc_time AS t4 on t1.xx= t4.xx
  left join lookup_4 FOR SYSTEM_TIME AS OF t1.proc_time AS t5 on t1.xx= t5.xx
  left join lookup_5 FOR SYSTEM_TIME AS OF t1.proc_time AS t6 on t1.xx= t6.xx
  left join lookup_6 FOR SYSTEM_TIME AS OF t1.proc_time AS t7 on t1.xx= t7.xx
...

  left join lookup_40 FOR SYSTEM_TIME AS OF t1.proc_time AS t40 on t1.xx= t40.xx
```

I have developed the async lookup feature , but that is not enough, maybe the current look up table is serial not parallelism ?  

Now I need help about how can I  improve the performance of my sql job .

Best 
White