(DEPRECATED) Apache Flink User Mailing List archive.

SVM classification problem.

Classic

List

Threaded

2 messages Options

Kürşat Kurt

SVM classification problem.

Hi;

I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong?

If i try to predict with the same set; it is failing, also it classifies like “-1” which is not in the training set.

What is wrong with this code?

Code:

def main(args: Array[String]): Unit = {

val env = ExecutionEnvironment.getExecutionEnvironment

val training = Seq(

new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0, 1.0, 1.0))),

new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9), Array(1.0, 1.0, 1.0, 1.0))),

new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))),

new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))),

new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))),

new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))))

val trainingDS = env.fromCollection(training)

val testingDS = env.fromCollection(training)

val svm = new SVM().setBlocks(env.getParallelism)

svm.fit(trainingDS)

val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label)))

predictions.print()

}

Output:

(1.0,1.0)

(0.0,1.0)

(0.0,-1.0)

(0.0,1.0)

(0.0,-1.0)

Simone Robutti

Re: SVM classification problem.

No, you don't get 100% accurracy in this case. You don't even want that, it would be a severe case of overfitting. You would have that only in the case that your dataset is linearly separable or separable with a finely tuned kernel, but in that case SVM would be an overkill and more traditional methodologies would suffice.

Flink SVM's implementation for binary classification returns "-1" as default label for the "negative" class. It's a rather raw implementation so it's better to use it exclusively if you have a clear idea of the underlying process, otherwise you could have problems if you treat it as a black box like you would do with more mature ML libraries.

2016-09-30 22:52 GMT+02:00 Kürşat Kurt <[hidden email]>:

Hi;

I am trying to train and predict with the same set. I expect that accuracy shuld be %100, am i wrong?
If i try to predict with the same set; it is failing, also it classifies like “-1” which is not in the training set.
What is wrong with this code?

Code:
def main(args: Array[String]): Unit = {
    val env = ExecutionEnvironment.getExecutionEnvironment
    val training = Seq(
      new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0, 1.0, 1.0))),
      new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9), Array(1.0, 1.0, 1.0, 1.0))),
      new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))),
      new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))),
      new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0, 1.0))),
      new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))))

    val trainingDS = env.fromCollection(training)
    val testingDS = env.fromCollection(training)
    val svm = new SVM().setBlocks(env.getParallelism)
    svm.fit(trainingDS)
    val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label)))
    predictions.print()

  }

Output:
(1.0,1.0)
(1.0,1.0)
(0.0,1.0)
(0.0,-1.0)
(0.0,1.0)
(0.0,-1.0)