How do I ensure binary comparisons are being used?

Posted by ljwagerfield on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806.html

I'd like to understand which operations can actually leverage "binary comparisons" when using the DataStream API. This is regarding the optimisation you receive when using Flink's own built-in serialization stack as opposed to Avro/Kryo/Json/etc... whereby fields are compared without the object needing to be deserialized.

I'm expecting that all operations which use field positions (i.e. `keyBy` and `partitionCustom`) leverage binary comparisons, since they don't require a reference to a deserialized object. **However I cannot see any other methods that take field-offset parameters... they all take callbacks... so does anything else in the DataStream API actually perform binary comparisons?**

For example, the `join` operation requires a closure to perform the equality check... which means the objects must be deserialized before being passed into the closure... unless something really clever is happening under-the-hood?

Please can you provide a list of stream operations that perform binary comparisons / avoid deserialization?

Thanks!