FlinkML and DataStream API

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

FlinkML and DataStream API

hkmaki

Hi,

 

I’m wondering if there is a way to use FlinkML and make predictions continuously for test data coming from a DataStream.

 

I know FlinkML only supports the DataSet API (batch) at the moment, but is there a way to convert a DataStream into DataSets? I’m thinking of something like

 

(0. fit model in batch mode)

1. window the DataStream

2. convert the windowed stream to DataSets

3. use the FlinkML methods to make predictions

 

BR,

Hanna

 

Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML and DataStream API

Theodore Vasiloudis
Hello Mäki, 

I think what you would like to do is train a model using batch, and use the Flink streaming API as a way to serve your model and make predictions. 

While we don't have an integrated way to do that in FlinkML currently, I definitely think that's possible. I know Marton Balassi has been working on something like this for the ALS algorithm, but I can't find the code right now on mobile.  
The general idea is to keep your model as state and use it to make predictions on a stream of incoming data. 

Model serving is definitely something we'll be working on in the future, I'll have a master student working on exactly that next semester. 

--
Sent from a mobile device. May contain autocorrect errors.

On Dec 21, 2016 5:24 PM, "Mäki Hanna" <[hidden email]> wrote:

Hi,

 

I’m wondering if there is a way to use FlinkML and make predictions continuously for test data coming from a DataStream.

 

I know FlinkML only supports the DataSet API (batch) at the moment, but is there a way to convert a DataStream into DataSets? I’m thinking of something like

 

(0. fit model in batch mode)

1. window the DataStream

2. convert the windowed stream to DataSets

3. use the FlinkML methods to make predictions

 

BR,

Hanna

 

Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML and DataStream API

Matt
I'm interested in that code you mentioned too, I hope you can find it.

Regards,
Matt

On Dec 21, 2016, at 17:12, Theodore Vasiloudis <[hidden email]> wrote:

Hello Mäki, 

I think what you would like to do is train a model using batch, and use the Flink streaming API as a way to serve your model and make predictions. 

While we don't have an integrated way to do that in FlinkML currently, I definitely think that's possible. I know Marton Balassi has been working on something like this for the ALS algorithm, but I can't find the code right now on mobile.  
The general idea is to keep your model as state and use it to make predictions on a stream of incoming data. 

Model serving is definitely something we'll be working on in the future, I'll have a master student working on exactly that next semester. 

--
Sent from a mobile device. May contain autocorrect errors.

On Dec 21, 2016 5:24 PM, "Mäki Hanna" <[hidden email]> wrote:

Hi,

 

I’m wondering if there is a way to use FlinkML and make predictions continuously for test data coming from a DataStream.

 

I know FlinkML only supports the DataSet API (batch) at the moment, but is there a way to convert a DataStream into DataSets? I’m thinking of something like

 

(0. fit model in batch mode)

1. window the DataStream

2. convert the windowed stream to DataSets

3. use the FlinkML methods to make predictions

 

BR,

Hanna

 

Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML and DataStream API

Márton Balassi

On Wed, Dec 21, 2016 at 9:38 PM, <[hidden email]> wrote:
I'm interested in that code you mentioned too, I hope you can find it.

Regards,
Matt

On Dec 21, 2016, at 17:12, Theodore Vasiloudis <[hidden email]> wrote:

Hello Mäki, 

I think what you would like to do is train a model using batch, and use the Flink streaming API as a way to serve your model and make predictions. 

While we don't have an integrated way to do that in FlinkML currently, I definitely think that's possible. I know Marton Balassi has been working on something like this for the ALS algorithm, but I can't find the code right now on mobile.  
The general idea is to keep your model as state and use it to make predictions on a stream of incoming data. 

Model serving is definitely something we'll be working on in the future, I'll have a master student working on exactly that next semester. 

--
Sent from a mobile device. May contain autocorrect errors.

On Dec 21, 2016 5:24 PM, "Mäki Hanna" <[hidden email]> wrote:

Hi,

 

I’m wondering if there is a way to use FlinkML and make predictions continuously for test data coming from a DataStream.

 

I know FlinkML only supports the DataSet API (batch) at the moment, but is there a way to convert a DataStream into DataSets? I’m thinking of something like

 

(0. fit model in batch mode)

1. window the DataStream

2. convert the windowed stream to DataSets

3. use the FlinkML methods to make predictions

 

BR,

Hanna

 

Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.