|
Hi Lydia,
that is certainly possible, however you need to adapt the algorithm a bit. The straight-forward approach would be to replicate the input data and assign IDs for each k-means run. If you have a data point (1, 2, 3) you could replicate it to three data points (10, 1, 2, 3), (15, 1, 2, 3), (20, 1, 2, 3) where the first field identifies the number of centers of a run. From there you need a bit of custom partitioning and composite keys to shuffle the data to the right workers. Hope that helps,
Fabian
|