instance number of user defined function

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

instance number of user defined function

lec ssmi
Hi:
   I always wonder how much instance has been initialized in the whole flink application.
   Suppose there is such a scenario:
       I have a  UDTF  called 'mongo_join'  through  which the flink table can join with external different mongo table  according to the parameters passed in.
       So ,I have a sql table called    trade . Throughout  all the pipeline, I  join the trade table with  item,  And payment. The sql statement as bellows:
    
           create view  trade_payment as  select trade_id, payment_id  from trade , lateral table (mongo_join('payment')) as T(payment_id);
          create view trade_item as  select trade_id,item_id from trade , , lateral table (mongo_join('item')) as T(payment_id);

    As everyone thinks, I use  some member variables to store  the different MongoConnection  in the  instance of the UDTF. 
    So , will there be concurrency problems?  And how are the instances of the function distributed?

  Thanks! 
    
Reply | Threaded
Open this post in threaded view
|

Re: instance number of user defined function

godfrey he
Hi, 

An UDTF will be wrapped into an operator, an operator instance will be executed by a slot (or parallelism/thread) , 
About operator, task, slot, you can refer to [1] for more details.
A TM (a JVM process) may has multiple slots, that means a JVM process may has multiple UDTF instances. 
It's better to make sure your UDTF stateless, otherwise you should care about thread-safe problem.


Best,
Godfrey



lec ssmi <[hidden email]> 于2020年4月16日周四 下午6:20写道:
Hi:
   I always wonder how much instance has been initialized in the whole flink application.
   Suppose there is such a scenario:
       I have a  UDTF  called 'mongo_join'  through  which the flink table can join with external different mongo table  according to the parameters passed in.
       So ,I have a sql table called    trade . Throughout  all the pipeline, I  join the trade table with  item,  And payment. The sql statement as bellows:
    
           create view  trade_payment as  select trade_id, payment_id  from trade , lateral table (mongo_join('payment')) as T(payment_id);
          create view trade_item as  select trade_id,item_id from trade , , lateral table (mongo_join('item')) as T(payment_id);

    As everyone thinks, I use  some member variables to store  the different MongoConnection  in the  instance of the UDTF. 
    So , will there be concurrency problems?  And how are the instances of the function distributed?

  Thanks! 
    
Reply | Threaded
Open this post in threaded view
|

Re: instance number of user defined function

lec ssmi
appreciating our reply.