How do I ensure binary comparisons are being used?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How do I ensure binary comparisons are being used?

ljwagerfield
I'd like to understand which operations can actually leverage "binary comparisons" when using the DataStream API. This is regarding the optimisation you receive when using Flink's own built-in serialization stack as opposed to Avro/Kryo/Json/etc... whereby fields are compared without the object needing to be deserialized.

I'm expecting that all operations which use field positions (i.e. `keyBy` and `partitionCustom`) leverage binary comparisons, since they don't require a reference to a deserialized object. **However I cannot see any other methods that take field-offset parameters... they all take callbacks... so does anything else in the DataStream API actually perform binary comparisons?**

For example, the `join` operation requires a closure to perform the equality check... which means the objects must be deserialized before being passed into the closure... unless something really clever is happening under-the-hood?

Please can you provide a list of stream operations that perform binary comparisons / avoid deserialization?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: How do I ensure binary comparisons are being used?

ljwagerfield
Any insights on this?

Thanks,
Lawrence
Reply | Threaded
Open this post in threaded view
|

Re: How do I ensure binary comparisons are being used?

Fabian Hueske-2
Hi Lawrence,

comparison of binary data are mainly used by the DataSet API when sorting large data sets or building and probing hash tables.

The DataStream API mainly benefits from Flink's custom and efficient serialization when sending data over the wire or taking checkpoints.
There are also plans to implement a state backend based on the serialization stack which leverages Flink's managed memory instead of holding object on the heap (the RocksDB state backend is the current solution to avoid this).

From what I know, the DataStream API does not perform compare on serialized data.

Best, Fabian



2017-01-03 7:53 GMT+01:00 ljwagerfield <[hidden email]>:
Any insights on this?

Thanks,
Lawrence



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806p10819.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: How do I ensure binary comparisons are being used?

ljwagerfield
Thank you Fabian :)