ML and Stream

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

ML and Stream

Christophe Jolif
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch. 

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?

Thanks,
--
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: ML and Stream

Fabian Hueske-2
Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch. 

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?

Thanks,
--
Christophe

Reply | Threaded
Open this post in threaded view
|

Re: ML and Stream

Christophe Jolif
Fabian,

Ok thanks for the update. Meanwhile I was looking at how I could still leverage current FlinkML API, but as far as I can see, it misses the ability of being able to persist its own models? So even for pure batch it prevents running your (once built) model in several jobs? Or am I missing something?

I suspect I should not be the only one that would love to apply machine learning as part of a Flink Processing? Waiting for FLIP-23 what are the "best" practices today? 

Thanks again for your help,
--
Christophe

On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <[hidden email]> wrote:
Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch. 

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?


Reply | Threaded
Open this post in threaded view
|

Re: ML and Stream

Fabian Hueske-2
That's correct.
It's not possible to persist data in memory across jobs in Flink's batch API.

Best, Fabian

2018-02-05 18:28 GMT+01:00 Christophe Jolif <[hidden email]>:
Fabian,

Ok thanks for the update. Meanwhile I was looking at how I could still leverage current FlinkML API, but as far as I can see, it misses the ability of being able to persist its own models? So even for pure batch it prevents running your (once built) model in several jobs? Or am I missing something?

I suspect I should not be the only one that would love to apply machine learning as part of a Flink Processing? Waiting for FLIP-23 what are the "best" practices today? 

Thanks again for your help,
--
Christophe

On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <[hidden email]> wrote:
Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch. 

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?