multiple k-means in parallel

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple k-means in parallel

Lydia Ickler
Hi,

I want to run k-means with different k in parallel.
So each worker should calculate its own k-means. Is that possible?

If I do a map on a list of integers to then apply k-means I get the following error:
Task not serializable

I am looking forward to your answers!
Lydia
Reply | Threaded
Open this post in threaded view
|

Re: multiple k-means in parallel

Fabian Hueske-2
Hi Lydia,

that is certainly possible, however you need to adapt the algorithm a bit.
The straight-forward approach would be to replicate the input data and assign IDs for each k-means run.
If you have a data point (1, 2, 3) you could replicate it to three data points (10, 1, 2, 3), (15, 1, 2, 3), (20, 1, 2, 3) where the first field identifies the number of centers of a run.
From there you need a bit of custom partitioning and composite keys to shuffle the data to the right workers.

Hope that helps,
Fabian

2016-11-27 11:48 GMT+01:00 Lydia Ickler <[hidden email]>:
Hi,

I want to run k-means with different k in parallel.
So each worker should calculate its own k-means. Is that possible?

If I do a map on a list of integers to then apply k-means I get the following error:
Task not serializable

I am looking forward to your answers!
Lydia