[PyFlink] update udf functions on the fly

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PyFlink] update udf functions on the fly

Rinat
Hi mates !

I'm in the beginning of the road of building a recommendation pipeline on top of Flink.
I'm going to register a list of UDF python functions on job startups where each UDF is an ML model.

Over time new model versions appear in the ML registry and I would like to update my UDF functions on the fly without need to restart the whole job.
Could you tell me, whether it's possible or not ? Maybe the community can give advice on how such tasks can be solved using Flink and what other approaches exist.

Thanks a lot for your help and advice !


Reply | Threaded
Open this post in threaded view
|

Re: [PyFlink] update udf functions on the fly

Arvid Heise-3
Hi Rinat,

Which API are you using? If you use datastream API, the common way to simulate side inputs (which is what you need) is to use a broadcast. There is an example on SO [1].


On Sat, Oct 10, 2020 at 7:12 PM Sharipov, Rinat <[hidden email]> wrote:
Hi mates !

I'm in the beginning of the road of building a recommendation pipeline on top of Flink.
I'm going to register a list of UDF python functions on job startups where each UDF is an ML model.

Over time new model versions appear in the ML registry and I would like to update my UDF functions on the fly without need to restart the whole job.
Could you tell me, whether it's possible or not ? Maybe the community can give advice on how such tasks can be solved using Flink and what other approaches exist.

Thanks a lot for your help and advice !




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: [PyFlink] update udf functions on the fly

Rinat
Hi Arvid, thx for your reply.

We are already using the approach with control streams to propagate business rules through our data-pipeline.

Because all our models are powered by Python, I'm going to use Table API and register UDF functions, where each UDF is a separate model.

So my question is - can I update the UDF function on the fly without a job restart ? 
Because new model versions become available on a daily basis and we should use them as soon as possible.

Thx !




пн, 12 окт. 2020 г. в 11:32, Arvid Heise <[hidden email]>:
Hi Rinat,

Which API are you using? If you use datastream API, the common way to simulate side inputs (which is what you need) is to use a broadcast. There is an example on SO [1].


On Sat, Oct 10, 2020 at 7:12 PM Sharipov, Rinat <[hidden email]> wrote:
Hi mates !

I'm in the beginning of the road of building a recommendation pipeline on top of Flink.
I'm going to register a list of UDF python functions on job startups where each UDF is an ML model.

Over time new model versions appear in the ML registry and I would like to update my UDF functions on the fly without need to restart the whole job.
Could you tell me, whether it's possible or not ? Maybe the community can give advice on how such tasks can be solved using Flink and what other approaches exist.

Thanks a lot for your help and advice !




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: [PyFlink] update udf functions on the fly

Dian Fu
Hi Rinat,

Do you want to replace the UDFs with new ones on the fly or just want to update the model which could be seen as instance variables inside the UDF?

For the former case, it's not supported AFAIK.
For the latter case, I think you could just update the model in the UDF periodically or according to some custom strategy. It's the behavior of the UDF.

Regards,
Dian

在 2020年10月12日,下午5:51,Sharipov, Rinat <[hidden email]> 写道:

Hi Arvid, thx for your reply.

We are already using the approach with control streams to propagate business rules through our data-pipeline.

Because all our models are powered by Python, I'm going to use Table API and register UDF functions, where each UDF is a separate model.

So my question is - can I update the UDF function on the fly without a job restart ? 
Because new model versions become available on a daily basis and we should use them as soon as possible.

Thx !




пн, 12 окт. 2020 г. в 11:32, Arvid Heise <[hidden email]>:
Hi Rinat,

Which API are you using? If you use datastream API, the common way to simulate side inputs (which is what you need) is to use a broadcast. There is an example on SO [1].


On Sat, Oct 10, 2020 at 7:12 PM Sharipov, Rinat <[hidden email]> wrote:
Hi mates !

I'm in the beginning of the road of building a recommendation pipeline on top of Flink.
I'm going to register a list of UDF python functions on job startups where each UDF is an ML model.

Over time new model versions appear in the ML registry and I would like to update my UDF functions on the fly without need to restart the whole job.
Could you tell me, whether it's possible or not ? Maybe the community can give advice on how such tasks can be solved using Flink and what other approaches exist.

Thanks a lot for your help and advice !




--
Arvid Heise | Senior Java Developer

Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   

Reply | Threaded
Open this post in threaded view
|

Re: [PyFlink] update udf functions on the fly

Rinat
Hi Dian, thx for your reply !

I was wondering to replace UDF on the fly from Flink, of course I'm pretty sure that it's possible to implement update logic directly in Python, thx for idea

Regards, 
Rinat

пн, 12 окт. 2020 г. в 14:20, Dian Fu <[hidden email]>:
Hi Rinat,

Do you want to replace the UDFs with new ones on the fly or just want to update the model which could be seen as instance variables inside the UDF?

For the former case, it's not supported AFAIK.
For the latter case, I think you could just update the model in the UDF periodically or according to some custom strategy. It's the behavior of the UDF.

Regards,
Dian

在 2020年10月12日,下午5:51,Sharipov, Rinat <[hidden email]> 写道:

Hi Arvid, thx for your reply.

We are already using the approach with control streams to propagate business rules through our data-pipeline.

Because all our models are powered by Python, I'm going to use Table API and register UDF functions, where each UDF is a separate model.

So my question is - can I update the UDF function on the fly without a job restart ? 
Because new model versions become available on a daily basis and we should use them as soon as possible.

Thx !




пн, 12 окт. 2020 г. в 11:32, Arvid Heise <[hidden email]>:
Hi Rinat,

Which API are you using? If you use datastream API, the common way to simulate side inputs (which is what you need) is to use a broadcast. There is an example on SO [1].


On Sat, Oct 10, 2020 at 7:12 PM Sharipov, Rinat <[hidden email]> wrote:
Hi mates !

I'm in the beginning of the road of building a recommendation pipeline on top of Flink.
I'm going to register a list of UDF python functions on job startups where each UDF is an ML model.

Over time new model versions appear in the ML registry and I would like to update my UDF functions on the fly without need to restart the whole job.
Could you tell me, whether it's possible or not ? Maybe the community can give advice on how such tasks can be solved using Flink and what other approaches exist.

Thanks a lot for your help and advice !




--
Arvid Heise | Senior Java Developer

Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng