Dear Flink developers, I have a question concerning the preservation of hash values. I have a hashmap keyed by Scala objects that directly inherit the hashCode() and equals() methods from Any. (These objects are only used to address values in the hashmap mentioned above; they aren't used as keys in any Flink operation, such as groupby, for instance.) I use a collection of such object keys in a MapReduce program. Unfortunately, during the Flink map and reduce phases the objects change their hash codes and become inconsistent with the keys of the original hashmap. However, I need them to conserve their hash values, s.t. they can still be used as keys. The objects/classes
discussed above are part of a 3rd party library.
Visibility issues prevent me from simply extending them to
override the hashCode() and equals() methods. Currently, I work
with a cloned version of that library in which I extend the
corresponding class in the following manner:
class Key { final val hash
= System.identityHashCode(this) // cache the hash value override def hashCode(): Int = { this.hash } ...// original
code }
First experiments suggest that the hash values are indeed preserved during MapReduce. However, hacking the library is a very clumsy approach. My question now is: does Flink provide a more elegant solution?
Thanks for your help!
Regards, Robert
Schwarzenberg |
Hi Robert,
>Unfortunately, during the Flink map and reduce phases the objects change >their hash codes and become inconsistent with the keys of the original hashmap If objects change their hash code values, then this means they are not equal anymore. If this is not desired then the implementation of hashCode() is flawed. You will have to change it or work around by wrapping it. Concerning `System.identityHashCode(Object o)`: This will only ever be the same for objects inside an instance of a JVM. As soon as you serialize and transfer the object to another JVM (happens during cluster execution), it will change. When you execute your job locally this works because it skips serialization when shipping the classes to the workers. However, later during shuffling the network stack will perform serialization and the hashCode() value changes even during local execution. Cheers, Max On Thu, Jul 28, 2016 at 3:21 PM, Robert Schwarzenberg <[hidden email]> wrote: > Dear Flink developers, > > I have a question concerning the preservation of hash values. > > I have a hashmap keyed by Scala objects that directly inherit the hashCode() > and equals() methods from Any. (These objects are only used to address > values in the hashmap mentioned above; they aren't used as keys in any Flink > operation, such as groupby, for instance.) > > I use a collection of such object keys in a MapReduce program. > Unfortunately, during the Flink map and reduce phases the objects change > their hash codes and become inconsistent with the keys of the original > hashmap. However, I need them to conserve their hash values, s.t. they can > still be used as keys. > > The objects/classes discussed above are part of a 3rd party library. > Visibility issues prevent me from simply extending them to override the > hashCode() and equals() methods. Currently, I work with a cloned version of > that library in which I extend the corresponding class in the following > manner: > > > class Key { > > final val hash = System.identityHashCode(this) // cache the hash value > > override def hashCode(): Int = { > > this.hash > > } > > ...// original code > > } > > > First experiments suggest that the hash values are indeed preserved during > MapReduce. However, hacking the library is a very clumsy approach. > > My question now is: does Flink provide a more elegant solution? > > > Thanks for your help! > > > Regards, > > Robert Schwarzenberg |
Free forum by Nabble | Edit this page |