Newbie question: Machine Learning Library of Apache Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Newbie question: Machine Learning Library of Apache Flink

Bilinmek Istemiyor
Hello

I am a complete newbie and I need help. I am evaluating the usage of  flink for my academic study and reading the documentation. I have a bit of experience in Apache Spark. I am asking this question, based on my experience in Apache Spark. 

In spark, there is a machine learning library embedded in the framework.  To the best of my knowledge,  the library is aware of RDD data structure and the machine learning algorithms do get benefits of cluster processing. I have read about flink cluster capability but I have not seen a machine learning library for flink. I have seen some references of machine learning library for flink  in google searches but they are linked to older versions of flink. It seems machine learning library has been dropped from the flink in latest releases.

My questions are;

1. Is it true that there is no customized machine learning library for flink  or I am missing something?
2. If there is no customized machine learning library for flink, what are my options?  Can I use any library which uses scala or java api?
3. If I use an external machine learning library, how this will impact cluster processing of flink. Does the processing of algorithms become bound to one flink instance? How can the algorithm be scaled, multiple machines? 

I appreciate any response, please respond me gently, like a  talking to a kid....I am really newbie...

Thank in advance...


Reply | Threaded
Open this post in threaded view
|

Re: Newbie question: Machine Learning Library of Apache Flink

Timo Walther
Hi,

it is true that there is no dedicated machine learning library for
Flink. Flink is a general data processing framework. It allows to
embedded any available algorithm library within user-defined functions.

Flink's focus is on stream processing. There are not many dedicated
stream processing algorithms out there. Usually, people run batch jobs
to train models and just evaluate the model in independent parallel
operator instances.

There are some ML efforts going on and I'm sure there will be more in
the future. But for now the community focuses on developing a very good
streaming runtime core.

https://github.com/alibaba/Alink

https://www.ververica.com/blog/flink-for-online-machine-learning-and-real-time-processing-at-weibo

I hope this helps a bit.

Regards,
Timo


On 31.01.21 06:01, Bilinmek Istemiyor wrote:

> Hello
>
> I am a complete newbie and I need help. I am evaluating the usage of  
> flink for my academic study and reading the documentation. I have a bit
> of experience in Apache Spark. I am asking this question, based on my
> experience in Apache Spark.
>
> In spark, there is a machine learning library embedded in the
> framework.  To the best of my knowledge,  the library is aware of RDD
> data structure and the machine learning algorithms do get benefits of
> cluster processing. I have read about flink cluster capability but I
> have not seen a machine learning library for flink. I have seen some
> references of machine learning library for flink  in google searches but
> they are linked to older versions of flink. It seems machine learning
> library has been dropped from the flink in latest releases.
>
> My questions are;
>
> 1. Is it true that there is no customized machine learning library for
> flink  or I am missing something?
> 2. If there is no customized machine learning library for flink, what
> are my options?  Can I use any library which uses scala or java api?
> 3. If I use an external machine learning library, how this will impact
> cluster processing of flink. Does the processing of algorithms become
> bound to one flink instance? How can the algorithm be scaled, multiple
> machines?
>
> I appreciate any response, please respond me gently, like a  talking to
> a kid....I am really newbie...
>
> Thank in advance...
>
>