Type Hints in the Java API

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Type Hints in the Java API

Stephan Ewen
Hi everyone!

We recently introduced type hints for the Java API. Since that is a pretty useful feature, I wanted to quickly explain what it is.

Kudos to Timo Walther, who did a large part of this work.


Background

Flink tries to figure out as much information about what types enter and leave user functions as possible.

 - For the POJO API (where one refers to field names), we need that information to make checks (for typos and type compatibility) before the job is executed.

 - For the upcoming logical programs (see roadmap draft) we need this to know the "schema" of functions.

 - The more we know, the better serialization and data layout schemes the compiler/optimizer can develop. That is quite important for the memory usage paradigm in Flink (work on serialized data inside/outside the heap and make serialization very cheap)

 - Finally, it also spares users having to worry about serialization frameworks and having to register types at those frameworks.


Problem

Scala is an easy case, because it preserves generic type information (ClassTags / Type Manifests), but Java erases generic type info in most cases.

We do reflection analysis on the user function classes to get the generic types. This logic also contains some simple type inference in case the functions have type variables (such as a MapFunction<T, Tuple2<T, Long>>).

Not in all cases can we figure out the data types of functions reliably in Java. Some issues remain with generic lambdas (we are trying to solve this with the Java community, see below) and with generic type variables that we cannot infer.


Solution: Type Hints

To make this cases work easily, a recent addition to the 0.9-SNAPSHOT master introduced type hints. They allow you to tell the system types that it cannot infer.

You can write code like

DataSet<SomeType> result = 
        dataSet.map(new MyGenericNonInferrableFunction<Long, SomeType>()).returns(SomeType.class);


To make specification of generic types easier, it also comes with a parser for simple string representations of generic types:

  .returns("Tuple2<Integer, my.SomeType>")


We suggest to use this instead of the "ResultTypeQueryable" workaround that has been used in some cases.


Improving Type information in Java

One Flink committer (Timo Walther) has actually become active in the Eclipse JDT compiler community and in the OpenJDK community to try and improve the way type information is available for lambdas.


Greetings,
Stephan