This post was updated on .
Hello Flinkers,
I am building a *streaming* prototype system on top of Flink and I want ideally to enable ML training (if possible DL) in Flink. It would be nice to lay down all the existing libraries that provide primitives that enable the training of ML models. I assume it is more efficient to do all the training in Flink (somehow) rather than (re)training a model in Tensorflow (or Pytorch) and porting it to a flink Job. For instance, https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink Especially, in streaming ML systems the training and the serving should both happen in an online fashion. To initialize the pool, I have found the following options that run on top of Flink i.e., leveraging the engine for distributed and scalable ML training. 1) *FlinkML(DataSet API)* https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html This is not for streaming ML as it shits on top of DataSet API. In addition, recently the library is dropped https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml but there is ongoing development (??) of a new library on top of TableAPI. https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs https://issues.apache.org/jira/browse/FLINK-12470 which is not in the 1.10 distribution. 2) *Apache Mahout* https://mahout.apache.org/ I thought it was long dead, but recently they started developing it again. 3) *Apache SAMOA* https://samoa.incubator.apache.org/ They are developing it, but slowly. It is an incubator project since 2013. 4) *FlinkML Organization* https://github.com/FlinkML This one has repos that are interesting e.g. the flink-jpmml https://github.com/FlinkML/flink-jpmml and an implementation of a parameter server https://github.com/FlinkML/flink-parameter-server , which is really usefull when for enabling distributed training in a sense that the model is also distributed during training. Though, the repo(s) are not really active. 5) *DeepLearning4j *https://deeplearning4j.org/ This is a distributed, deep learning library that it was said to work also on top of Flink (here http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-support-for-DeepLearning4j-or-other-deep-learning-library-td12157.html) I am not interested at all in GPU support but I am wondering is anyone had succesfully used this one on top of Flink. 6) *Proteus - SOLMA* https://github.com/proteus-h2020/proteus-solma It is a scalable online learning library on top of Flink, and is the output of a H2020 research project called PROTEUS. http://www.bdva.eu/sites/default/files/hbouchachia_sacbd-ecsa18.pdf 7) *Alibaba - ALink* https://github.com/alibaba/Alink/blob/master/README.en-US.md A machine learning algorithm platform from Alibaba which is actively maintained. 8) *Alibaba - flink-ai-extended* https://github.com/alibaba/flink-ai-extended This project is to extend deep learning framework on the Flink project. Currently supports tensorflow running on flink. These are all the available systems that I have found ML using Flink's engine. *Questions* (i) Has anyone used them? (ii) More specifically, has someone implemented *Stochastic Gradient Descent, Skip-gram models, Autoencoders* with any of the above tools (or other)? *Remarks* If you have any experiences/comments/additions to share please do it! Gotta Catch 'Em All! <https://www.youtube.com/watch?v=MpaHR-V_R-o> Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Max,
as far as I know a better ML story for Flink is in the making. I will loop in Becket in CC who may give you more information. Regards, Timo On 28.04.20 07:20, m@xi wrote: > Hello Flinkers, > > I am building a *streaming* prototype system on top of Flink and I want > ideally to enable ML training (if possible DL) in Flink. It would be nice to > lay down all the existing libraries that provide primitives that enable the > training of ML models. > > I assume it is more efficient to do all the training in Flink (somehow) > rather than (re)training a model in Tensorflow (or Pytorch) and porting it > to a flink Job. For instance, > https://stackoverflow.com/questions/59563265/embedd-existing-ml-model-in-apache-flink > Especially, in streaming ML systems the training and the serving should both > happen in an online fashion. > > To initialize the pool, I have found the following options that run on top > of Flink i.e., leveraging the engine for distributed and scalable ML > training. > > 1) *FlinkML(DataSet API)* > https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html > This is not for streaming ML as it shits on top of DataSet API. In addition, > recently the library is dropped > https://stackoverflow.com/questions/58752787/what-is-the-status-of-flinkml > but there is ongoing development (??) of a new library on top of TableAPI. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs > https://issues.apache.org/jira/browse/FLINK-12470 > which is not in the 1.10 distribution. > > 2) *Apache Mahout* https://mahout.apache.org/ > I thought it was long dead, but recently they started developing it again. > > 3) *Apache SAMOA* https://samoa.incubator.apache.org/ > They are developing it, but slowly. It is an incubator project since 2013. > > 4) *FlinkML Organization* https://github.com/FlinkML > This one has repos that are interesting e.g. the flink-jpmml > https://github.com/FlinkML/flink-jpmml > and an implementation of a parameter server > https://github.com/FlinkML/flink-parameter-server > , which is really usefull when for enabling distributed training in a sense > that the model is also distributed during training. > Though, the repo(s) are not really active. > > 5) *DeepLearning4j *https://deeplearning4j.org/ > This is a distributed, deep learning library that it was said to work also > on top of Flink (here > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-support-for-DeepLearning4j-or-other-deep-learning-library-td12157.html) > I am not interested at all in GPU support but I am wondering is anyone had > succesfully used this one on top of Flink. > > 6) *Proteus - SOLMA* https://github.com/proteus-h2020/proteus-solma > It is a scalable online learning library on top of Flink, and is the output > of a H2020 research project called PROTEUS. > http://www.bdva.eu/sites/default/files/hbouchachia_sacbd-ecsa18.pdf > > 7) *Alibaba - ALink* > https://github.com/alibaba/Alink/blob/master/README.en-US.md > A machine learning algorithm platform from Alibaba which is actively > maintained. > > These are all the available systems that I have found ML using Flink's > engine. > > *Questions* > (i) Has anyone used them? > (ii) More specifically, has someone implemented *Stochastic Gradient > Descent, Skip-gram models, Autoencoders* with any of the above tools (or > other)? > > *Remarks* > If you have any experiences/comments/additions to share please do it! Gotta > Catch 'Em All! <https://www.youtube.com/watch?v=MpaHR-V_R-o> > > Best, > Max > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
Hi Max, Thanks for the question and sharing your findings. To be honest, I was not aware some of the projects until I see your list. First, to answer you questions: (i) Has anyone used them? While I am not sure about the number of users of every listed project, Alink is definitely used by Alibaba. In fact, Alink team is trying to contribute the code to Flink repo to and become the new FlinkML library. Besides, I would like to add flink-ai-extended (https://github.com/alibaba/flink-ai-extended) to the list. This project allows you to run TensorFlow / PyTorch on top of Flink. It is actively used and maintained by Alibaba as well. (ii) More specifically, has someone implemented *Stochastic Gradient I think Alink has SGD there, but I did not find skip-gram / Autoencorder. Some more comments / replies below: I assume it is more efficient to do all the training in Flink (somehow) I guess it depends on what exactly you want to do. If you are doing a training running for hours with a lot of rounds of iterations until it converges, having it trained separately and then porting it to Flink for inference might not lose too much efficiency. However, if you are doing online learning to incrementally update your model as the samples flow by, having such incremental training embedded into Flink would make a lot of sense. Flink-ai-extended was created to support both cases, but it is definitely more attractive in the incremental training case. 1) *FlinkML(DataSet API)* We removed the DataSet based FlinkML library because at that point it looks that there is no users of it. Removing it allows us to use cleaner package paths. That said, personally I agree that we should mark the library as deprecated and remove it from the code base in a later release. It looks you are looking for a ML algorithm library. Not sure if you are also interested in ML engineering part. We have an ongoing project called Flink AI Flow which allows you define an end-to-end online learning workflow, with datasets, models and metrics managed. I had a talk about it at the recent Flink Forward virtual event. The videos should be available soon. But feel free to reach out to me for more details. Thanks, Jiangjie (Becket) Qin On Wed, Apr 29, 2020 at 1:12 AM Timo Walther <[hidden email]> wrote: Hi Max, |
Hello Timo and Bechet,
@Timo: Thanks a lot for the message forwarding. @Bechet: Thanks for your answer. I was not aware of *flink-ai-extended* project. Also I was not aware of the fact that ALink is striving to become the new FlinkML. Definitely, I will look into ALink and flink-ai-extended. Alibaba knows better haha. To address your answer: "I guess it depends on what exactly you want to do. If you are doing a training running for hours with a lot of rounds of iterations until it converges, having it trained separately and then porting it to Flink for inference might not lose too much efficiency. However, if you are doing online learning to incrementally update your model as the samples flow by, having such incremental training embedded into Flink would make a lot of sense. Flink-ai-extended was created to support both cases, but it is definitely more attractive in the incremental training case." --> Well, since I focus on streaming I think an online training and serving solution, a.k.a. prequential training, would be more suitable. There are two problems I see in this direction though: 1) To enable *efficient* and *reliable* (super important as well) training in a streaming fashion, Flink DataStream part should support iterations thoroughly. However, I have read from multiple sources that iterations on Flink are not there yet. This is why I searched for other solutions to investigate what they do. Alibaba seems like a good direction as you said also ALink is used in production. 2) Due to lack of reliability, in the majority of use cases prequential training is not chosen by companies, which instead rely on the solution you described; hence, (re)training the model (maybe for hours) and port it to Flink whenever it is ready. Nevertheless, thanks a lot for your answers. @Flinkers: Let's gather any other solutions that exist and are not listed. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In reply to this post by Becket Qin
Hello Becket,
I just watched your Flink Forward talk. Really interesting! I leave the link here as it is related to the post. AI Flow (FF20) <https://www.youtube.com/watch?v=xiYJTCj2zUU> Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |