(DEPRECATED) Apache Flink User Mailing List archive.

ML/DL via Flink

Classic

List

Threaded

5 messages Options

m@xi

ML/DL via Flink

This post was updated on .

Hello Flinkers,

I am building a *streaming* prototype system on top of Flink and I want
ideally to enable ML training (if possible DL) in Flink. It would be nice to
lay down all the existing libraries that provide primitives that enable the
training of ML models.

I assume it is more efficient to do all the training in Flink (somehow)
rather than (re)training a model in Tensorflow (or Pytorch) and porting it
to a flink Job. For instance,
https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink
Especially, in streaming ML systems the training and the serving should both
happen in an online fashion.

To initialize the pool, I have found the following options that run on top
of Flink i.e., leveraging the engine for distributed and scalable ML
training.

1) *FlinkML(DataSet API)*
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html
This is not for streaming ML as it shits on top of DataSet API. In addition,
recently the library is dropped
https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml
but there is ongoing development (??) of a new library on top of TableAPI.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
https://issues.apache.org/jira/browse/FLINK-12470
which is not in the 1.10 distribution.

2) *Apache Mahout* https://mahout.apache.org/
I thought it was long dead, but recently they started developing it again.

3) *Apache SAMOA* https://samoa.incubator.apache.org/
They are developing it, but slowly. It is an incubator project since 2013.

4) *FlinkML Organization* https://github.com/FlinkML
This one has repos that are interesting e.g. the flink-jpmml
https://github.com/FlinkML/flink-jpmml
and an implementation of a parameter server
https://github.com/FlinkML/flink-parameter-server
, which is really usefull when for enabling distributed training in a sense
that the model is also distributed during training.
Though, the repo(s) are not really active.

5) *DeepLearning4j *https://deeplearning4j.org/
This is a distributed, deep learning library that it was said to work also
on top of Flink (here
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-support-for-DeepLearning4j-or-other-deep-learning-library-td12157.html)
I am not interested at all in GPU support but I am wondering is anyone had
succesfully used this one on top of Flink.

6) *Proteus - SOLMA* https://github.com/proteus-h2020/proteus-solma
It is a scalable online learning library on top of Flink, and is the output
of a H2020 research project called PROTEUS.
http://www.bdva.eu/sites/default/files/hbouchachia_sacbd-ecsa18.pdf

7) *Alibaba - ALink*
https://github.com/alibaba/Alink/blob/master/README.en-US.md
A machine learning algorithm platform from Alibaba which is actively
maintained.

8) *Alibaba - flink-ai-extended*
https://github.com/alibaba/flink-ai-extended
This project is to extend deep learning framework on the Flink project. Currently supports tensorflow running on flink.

These are all the available systems that I have found ML using Flink's
engine.

*Questions*
(i) Has anyone used them?
(ii) More specifically, has someone implemented *Stochastic Gradient
Descent, Skip-gram models, Autoencoders* with any of the above tools (or
other)?

*Remarks*
If you have any experiences/comments/additions to share please do it! Gotta
Catch 'Em All! <https://www.youtube.com/watch?v=MpaHR-V_R-o>

Best,
Max

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Timo Walther

Re: ML/DL via Flink

Hi Max,

as far as I know a better ML story for Flink is in the making. I will
loop in Becket in CC who may give you more information.

Regards,
Timo

On 28.04.20 07:20, m@xi wrote:

> Hello Flinkers,
>
> I am building a *streaming* prototype system on top of Flink and I want
> ideally to enable ML training (if possible DL) in Flink. It would be nice to
> lay down all the existing libraries that provide primitives that enable the
> training of ML models.
>
> I assume it is more efficient to do all the training in Flink (somehow)
> rather than (re)training a model in Tensorflow (or Pytorch) and porting it
> to a flink Job. For instance,
> https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink
> Especially, in streaming ML systems the training and the serving should both
> happen in an online fashion.
>
> To initialize the pool, I have found the following options that run on top
> of Flink i.e., leveraging the engine for distributed and scalable ML
> training.
>
> 1) *FlinkML(DataSet API)*
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html
> This is not for streaming ML as it shits on top of DataSet API. In addition,
> recently the library is dropped
> https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml
> but there is ongoing development (??) of a new library on top of TableAPI.
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> https://issues.apache.org/jira/browse/FLINK-12470
> which is not in the 1.10 distribution.
>
> 2) *Apache Mahout* https://mahout.apache.org/
> I thought it was long dead, but recently they started developing it again.
>
> 3) *Apache SAMOA* https://samoa.incubator.apache.org/
> They are developing it, but slowly. It is an incubator project since 2013.
>
> 4) *FlinkML Organization* https://github.com/FlinkML
> This one has repos that are interesting e.g. the flink-jpmml
> https://github.com/FlinkML/flink-jpmml
> and an implementation of a parameter server
> https://github.com/FlinkML/flink-parameter-server
> , which is really usefull when for enabling distributed training in a sense
> that the model is also distributed during training.
> Though, the repo(s) are not really active.
>
> 5) *DeepLearning4j *https://deeplearning4j.org/
> This is a distributed, deep learning library that it was said to work also
> on top of Flink (here
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-support-for-DeepLearning4j-or-other-deep-learning-library-td12157.html)
> I am not interested at all in GPU support but I am wondering is anyone had
> succesfully used this one on top of Flink.
>
> 6) *Proteus - SOLMA* https://github.com/proteus-h2020/proteus-solma
> It is a scalable online learning library on top of Flink, and is the output
> of a H2020 research project called PROTEUS.
> http://www.bdva.eu/sites/default/files/hbouchachia_sacbd-ecsa18.pdf
>
> 7) *Alibaba - ALink*
> https://github.com/alibaba/Alink/blob/master/README.en-US.md
> A machine learning algorithm platform from Alibaba which is actively
> maintained.
>
> These are all the available systems that I have found ML using Flink's
> engine.
>
> *Questions*
> (i) Has anyone used them?
> (ii) More specifically, has someone implemented *Stochastic Gradient
> Descent, Skip-gram models, Autoencoders* with any of the above tools (or
> other)?
>
> *Remarks*
> If you have any experiences/comments/additions to share please do it! Gotta
> Catch 'Em All! <https://www.youtube.com/watch?v=MpaHR-V_R-o>
>
> Best,
> Max
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Becket Qin

Re: ML/DL via Flink

Hi Max,

Thanks for the question and sharing your findings. To be honest, I was not aware some of the projects until I see your list.

First, to answer you questions:

(i) Has anyone used them?

While I am not sure about the number of users of every listed project, Alink is definitely used by Alibaba. In fact, Alink team is trying to contribute the code to Flink repo to and become the new FlinkML library.

Besides, I would like to add flink-ai-extended (https://github.com/alibaba/flink-ai-extended) to the list. This project allows you to run TensorFlow / PyTorch on top of Flink. It is actively used and maintained by Alibaba as well.

(ii) More specifically, has someone implemented *Stochastic Gradient
Descent, Skip-gram models, Autoencoders* with any of the above tools (or
other)?

I think Alink has SGD there, but I did not find skip-gram / Autoencorder.

Some more comments / replies below:

I assume it is more efficient to do all the training in Flink (somehow)
rather than (re)training a model in Tensorflow (or Pytorch) and porting it
to a flink Job. For instance,
https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink
Especially, in streaming ML systems the training and the serving should both
happen in an online fashion.

I guess it depends on what exactly you want to do. If you are doing a training running for hours with a lot of rounds of iterations until it converges, having it trained separately and then porting it to Flink for inference might not lose too much efficiency. However, if you are doing online learning to incrementally update your model as the samples flow by, having such incremental training embedded into Flink would make a lot of sense. Flink-ai-extended was created to support both cases, but it is definitely more attractive in the incremental training case.

1) *FlinkML(DataSet API)*
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html
This is not for streaming ML as it shits on top of DataSet API. In addition,
recently the library is dropped
https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml
but there is ongoing development (??) of a new library on top of TableAPI.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
https://issues.apache.org/jira/browse/FLINK-12470
which is not in the 1.10 distribution.

We removed the DataSet based FlinkML library because at that point it looks that there is no users of it. Removing it allows us to use cleaner package paths. That said, personally I agree that we should mark the library as deprecated and remove it from the code base in a later release.

It looks you are looking for a ML algorithm library. Not sure if you are also interested in ML engineering part. We have an ongoing project called Flink AI Flow which allows you define an end-to-end online learning workflow, with datasets, models and metrics managed. I had a talk about it at the recent Flink Forward virtual event. The videos should be available soon. But feel free to reach out to me for more details.

Thanks,

Jiangjie (Becket) Qin

On Wed, Apr 29, 2020 at 1:12 AM Timo Walther <[hidden email]> wrote:

Hi Max,

as far as I know a better ML story for Flink is in the making. I will
loop in Becket in CC who may give you more information.

Regards,
Timo

On 28.04.20 07:20, m@xi wrote:
> Hello Flinkers,
>
> I am building a *streaming* prototype system on top of Flink and I want
> ideally to enable ML training (if possible DL) in Flink. It would be nice to
> lay down all the existing libraries that provide primitives that enable the
> training of ML models.
>
> I assume it is more efficient to do all the training in Flink (somehow)
> rather than (re)training a model in Tensorflow (or Pytorch) and porting it
> to a flink Job. For instance,
> https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink
> Especially, in streaming ML systems the training and the serving should both
> happen in an online fashion.
>
> To initialize the pool, I have found the following options that run on top
> of Flink i.e., leveraging the engine for distributed and scalable ML
> training.
>
> 1) *FlinkML(DataSet API)*
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html
> This is not for streaming ML as it shits on top of DataSet API. In addition,
> recently the library is dropped
> https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml
> but there is ongoing development (??) of a new library on top of TableAPI.
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> https://issues.apache.org/jira/browse/FLINK-12470
> which is not in the 1.10 distribution.
>
> 2) *Apache Mahout* https://mahout.apache.org/
> I thought it was long dead, but recently they started developing it again.
>
> 3) *Apache SAMOA* https://samoa.incubator.apache.org/
> They are developing it, but slowly. It is an incubator project since 2013.
>
> 4) *FlinkML Organization* https://github.com/FlinkML
> This one has repos that are interesting e.g. the flink-jpmml
> https://github.com/FlinkML/flink-jpmml
> and an implementation of a parameter server
> https://github.com/FlinkML/flink-parameter-server
> , which is really usefull when for enabling distributed training in a sense
> that the model is also distributed during training.
> Though, the repo(s) are not really active.
>
> 5) *DeepLearning4j *https://deeplearning4j.org/
> This is a distributed, deep learning library that it was said to work also
> on top of Flink (here
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-support-for-DeepLearning4j-or-other-deep-learning-library-td12157.html)
> I am not interested at all in GPU support but I am wondering is anyone had
> succesfully used this one on top of Flink.
>
> 6) *Proteus - SOLMA* https://github.com/proteus-h2020/proteus-solma
> It is a scalable online learning library on top of Flink, and is the output
> of a H2020 research project called PROTEUS.
> http://www.bdva.eu/sites/default/files/hbouchachia_sacbd-ecsa18.pdf
>
> 7) *Alibaba - ALink*
> https://github.com/alibaba/Alink/blob/master/README.en-US.md
> A machine learning algorithm platform from Alibaba which is actively
> maintained.
>
> These are all the available systems that I have found ML using Flink's
> engine.
>
> *Questions*
> (i) Has anyone used them?
> (ii) More specifically, has someone implemented *Stochastic Gradient
> Descent, Skip-gram models, Autoencoders* with any of the above tools (or
> other)?
>
> *Remarks*
> If you have any experiences/comments/additions to share please do it! Gotta
> Catch 'Em All! <https://www.youtube.com/watch?v=MpaHR-V_R-o>
>
> Best,
> Max
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

m@xi

Re: ML/DL via Flink

Hello Timo and Bechet,

@Timo: Thanks a lot for the message forwarding.

@Bechet: Thanks for your answer. I was not aware of *flink-ai-extended*
project.

Also I was not aware of the fact that ALink is striving to become the new
FlinkML. Definitely, I will look into ALink and flink-ai-extended. Alibaba
knows better haha.

To address your answer:

"I guess it depends on what exactly you want to do. If you are doing a
training running for hours with a lot of rounds of iterations until it
converges, having it trained separately and then porting it to Flink for
inference might not lose too much efficiency. However, if you are doing
online learning to incrementally update your model as the samples flow by,
having such incremental training embedded into Flink would make a lot of
sense. Flink-ai-extended was created to support both cases, but it is
definitely more attractive in the incremental training case."

--> Well, since I focus on streaming I think an online training and serving
solution, a.k.a. prequential training, would be more suitable. There are two
problems I see in this direction though:
1) To enable *efficient* and *reliable* (super important as well) training
in a streaming fashion, Flink DataStream part should support iterations
thoroughly. However, I have read from multiple sources that iterations on
Flink are not there yet. This is why I searched for other solutions to
investigate what they do. Alibaba seems like a good direction as you said
also ALink is used in production.

2) Due to lack of reliability, in the majority of use cases prequential
training is not chosen by companies, which instead rely on the solution you
described; hence, (re)training the model (maybe for hours) and port it to
Flink whenever it is ready.

Nevertheless, thanks a lot for your answers.

@Flinkers: Let's gather any other solutions that exist and are not listed.

Best,
Max

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

m@xi

Re: ML/DL via Flink

In reply to this post by Becket Qin

Hello Becket,

I just watched your Flink Forward talk. Really interesting!

I leave the link here as it is related to the post. AI Flow (FF20)
<https://www.youtube.com/watch?v=xiYJTCj2zUU>

Best,
Max

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/