Hi everybody,
I try currently to implement a Machine Learning algorithm on Stratosphere for the ML group at TU Berlin. I ran into some issues. Here is the first one.
The input data I get is of a unknown dimension i.e. I have a list of vectors represent as CSV input with each row representing one vector. Currently I've solved the problem with this code snippet:
def getInputSource(XFile: String) = {
//todo: make nicer
dimensions match {
case 1 => DataSource(XFile, CsvInputFormat[Float](" "));
case 2 => DataSource(XFile, CsvInputFormat[(Float, Float)](" "));
case 3 => DataSource(XFile, CsvInputFormat[(Float, Float, Float)](" "));
case 4 => DataSource(XFile, CsvInputFormat[(Float, Float, Float, Float)](" "));
case 5 => DataSource(XFile, CsvInputFormat[(Float, Float, Float, Float, Float)](" "));
....
Unfortunately there are data sets with larger dimensions than Scala tuples can be (22) f.e. 350. (Besides the code style.)
Is there better way to solve this problem?
Cheers,
Max