Hi; I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong? If i try to predict with the same set; it is failing, also it classifies like “-1” which is not in the training set. What is wrong with this code? Code: def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment val training = Seq( new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0, 1.0, 1.0))), new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9), Array(1.0, 1.0, 1.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))), new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))), new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0)))) val trainingDS = env.fromCollection(training) val testingDS = env.fromCollection(training) val svm = new SVM().setBlocks(env.getParallelism) svm.fit(trainingDS) val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label))) predictions.print() } Output: (1.0,1.0) (1.0,1.0) (0.0,1.0) (0.0,-1.0) (0.0,1.0) (0.0,-1.0) |
No, you don't get 100% accurracy in this case. You don't even want that, it would be a severe case of overfitting. You would have that only in the case that your dataset is linearly separable or separable with a finely tuned kernel, but in that case SVM would be an overkill and more traditional methodologies would suffice. Flink SVM's implementation for binary classification returns "-1" as default label for the "negative" class. It's a rather raw implementation so it's better to use it exclusively if you have a clear idea of the underlying process, otherwise you could have problems if you treat it as a black box like you would do with more mature ML libraries. 2016-09-30 22:52 GMT+02:00 Kürşat Kurt <[hidden email]>:
|
Free forum by Nabble | Edit this page |