flink ml - k-means

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

flink ml - k-means

Pa Rö
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul
Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Alexander Alexandrov
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul

Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Till Rohrmann

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till

On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <[hidden email]> wrote:
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul

Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Pa Rö
Hi Alexander and Till,

thanks for your informations, I look forward to the release.
I'm curious how well is flink ml against mahout und spark ml.

best regerds
Paul

2015-04-27 9:23 GMT+02:00 Till Rohrmann <[hidden email]>:

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till

On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <[hidden email]> wrote:
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul


Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Pa Rö
hi,

now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?

best regards
paul

2015-04-27 9:36 GMT+02:00 Pa Rö <[hidden email]>:
Hi Alexander and Till,

thanks for your informations, I look forward to the release.
I'm curious how well is flink ml against mahout und spark ml.

best regerds
Paul

2015-04-27 9:23 GMT+02:00 Till Rohrmann <[hidden email]>:

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till

On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <[hidden email]> wrote:
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul



Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Stephan Ewen
Paul!

Can you use the KMeans example? The code is for three-dimensional points, but you should be able to generalize it easily.
That would be the fastest way to go. without waiting for any release dates...

Stephan


On Mon, May 11, 2015 at 2:46 PM, Pa Rö <[hidden email]> wrote:
hi,

now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?

best regards
paul

2015-04-27 9:36 GMT+02:00 Pa Rö <[hidden email]>:
Hi Alexander and Till,

thanks for your informations, I look forward to the release.
I'm curious how well is flink ml against mahout und spark ml.

best regerds
Paul

2015-04-27 9:23 GMT+02:00 Till Rohrmann <[hidden email]>:

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till

On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <[hidden email]> wrote:
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul




Reply | Threaded
Open this post in threaded view
|

Re: flink ml - k-means

Pa Rö

2015-05-11 21:56 GMT+02:00 Stephan Ewen <[hidden email]>:
Paul!

Can you use the KMeans example? The code is for three-dimensional points, but you should be able to generalize it easily.
That would be the fastest way to go. without waiting for any release dates...

Stephan


On Mon, May 11, 2015 at 2:46 PM, Pa Rö <[hidden email]> wrote:
hi,

now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?

best regards
paul

2015-04-27 9:36 GMT+02:00 Pa Rö <[hidden email]>:
Hi Alexander and Till,

thanks for your informations, I look forward to the release.
I'm curious how well is flink ml against mahout und spark ml.

best regerds
Paul

2015-04-27 9:23 GMT+02:00 Till Rohrmann <[hidden email]>:

Hi Paul,

if you can't wait, a vanilla implementation is already contained as part of the Flink examples. You should find it under flink/flink-examples.

But we will try to add more clustering algorithms in the near future.

Cheers,
Till

On Apr 26, 2015 11:14 PM, "Alexander Alexandrov" <[hidden email]> wrote:
Yes, I expect to have one in the next few weeks (the code is actually there, but we need to port it to the Flink ML API). I suggest to follow the JIRA issue in the next weeks to check when this is done:

https://issues.apache.org/jira/browse/FLINK-1731

Regards,
Alexander

PS. Bear in mind that we will start with a vanilla implementation of K-Means. For a thorough evaluation you might want to also check variants like K-Means++.


2015-04-24 15:08 GMT+02:00 Pa Rö <[hidden email]>:
hi flink community,

at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time and space). What is the current state for the purpose of clustering, especially K-Means? Will there be in the near future a release information this?

best greetings
paul