Re: Questions
Posted by
Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Questions-tp14486p14490.html
Hi Egor,
There is the Row type which is not strongly typed (such as TupleX) but supports arbitrary number of fields and null-valued fields.
The DataSet API does not have a split operator and implementing this would be much more difficult than one would expect. The problem is in the optimizer which assumes that all outputs of an operator receive the same data. So we would have to change the plan enumeration logic.
However, there is a workaround for this. I would convert the String into an Either<String, Double> (Flink features a Java Either type), emit the dataset to two filters and the first filters on Either.isLeft and the second on Either.isRight (or you you use a FlatMap to directly extract it from the Either:
DataSet<String> input = ...
DataSet<Either<Double, String> parsed = input.map(// string -> either);
DataSet<Double> doubles = parsed.flatMap(// if Either.isLeft -> Either.left);
DataSet<String> failed = parsed.flatMap(// Either.isRight -> Either.right);
Best, Fabian