(DEPRECATED) Apache Flink User Mailing List archive.

ML and Stream

Classic

List

Threaded

4 messages Options

Christophe Jolif

ML and Stream

Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch.

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?

Thanks,

Christophe

Fabian Hueske-2

Re: ML and Stream

Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].

One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.

If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

[1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:

Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch.

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?

Thanks,
--
Christophe

Christophe Jolif

Re: ML and Stream

Fabian,

Ok thanks for the update. Meanwhile I was looking at how I could still leverage current FlinkML API, but as far as I can see, it misses the ability of being able to persist its own models? So even for pure batch it prevents running your (once built) model in several jobs? Or am I missing something?

I suspect I should not be the only one that would love to apply machine learning as part of a Flink Processing? Waiting for FLIP-23 what are the "best" practices today?

Thanks again for your help,

Christophe

On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <[hidden email]> wrote:

Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

[1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch.

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?

Fabian Hueske-2

Re: ML and Stream

That's correct.
It's not possible to persist data in memory across jobs in Flink's batch API.

Best, Fabian

2018-02-05 18:28 GMT+01:00 Christophe Jolif <[hidden email]>:

Fabian,

Ok thanks for the update. Meanwhile I was looking at how I could still leverage current FlinkML API, but as far as I can see, it misses the ability of being able to persist its own models? So even for pure batch it prevents running your (once built) model in several jobs? Or am I missing something?

I suspect I should not be the only one that would love to apply machine learning as part of a Flink Processing? Waiting for FLIP-23 what are the "best" practices today?

Thanks again for your help,
--
Christophe

On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <[hidden email]> wrote:
Hi Christophe,

it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.

In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1].
One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream.
If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions.

Best, Fabian

[1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving

2018-02-05 16:43 GMT+01:00 Christophe Jolif <[hidden email]>:
Hi all,

Sorry, this is me again with another question.

Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch.

If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of Flink, and provide functionality designed specifically for data streams" but from my external point of view, I don't see much happening here. Is there work in progress towards that?

I would personally see two use-cases around streaming, first one around updating an existing model that was build in batch, second one would be triggering prediction not through a batch job but in a stream job.

Are these things that are in the works? or maybe already feasible despite the API looking like purely batch branded?