Questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions

Егор Литвиненко
Hi

Is there a way to process mapping errors in Flink?
For example when string is valid double write in one table, otherwise in another?
If not, what problems you see reffered to this opportunity and if I will make PR, where I should start to implenent this feature?

I saw Tuple1, 2, etc. Many methods for different tuples to define types of DataSet.
But I don't see Tuple with custom size. I mean something like new Tuple(List<Class<?>> types)
Did I miss something?

In best regards, Egor Litvinenko
Reply | Threaded
Open this post in threaded view
|

Re: Questions

Fabian Hueske-2
Hi Egor,

There is the Row type which is not strongly typed (such as TupleX) but supports arbitrary number of fields and null-valued fields.

The DataSet API does not have a split operator and implementing this would be much more difficult than one would expect. The problem is in the optimizer which assumes that all outputs of an operator receive the same data. So we would have to change the plan enumeration logic.
However, there is a workaround for this. I would convert the String into an Either<String, Double> (Flink features a Java Either type), emit the dataset to two filters and the first filters on Either.isLeft and the second on Either.isRight (or you you use a FlatMap to directly extract it from the Either:

DataSet<String> input = ...
DataSet<Either<Double, String> parsed = input.map(// string -> either);
DataSet<Double> doubles = parsed.flatMap(// if Either.isLeft -> Either.left);
DataSet<String> failed = parsed.flatMap(// Either.isRight -> Either.right);

Best, Fabian


2017-07-27 8:46 GMT+02:00 Егор Литвиненко <[hidden email]>:
Hi

Is there a way to process mapping errors in Flink?
For example when string is valid double write in one table, otherwise in another?
If not, what problems you see reffered to this opportunity and if I will make PR, where I should start to implenent this feature?

I saw Tuple1, 2, etc. Many methods for different tuples to define types of DataSet.
But I don't see Tuple with custom size. I mean something like new Tuple(List<Class<?>> types)
Did I miss something?

In best regards, Egor Litvinenko