(DEPRECATED) Apache Flink User Mailing List archive.

hash preservation

Classic

List

Threaded

2 messages Options

Robert Schwarzenberg

hash preservation

Dear Flink developers,

I have a question concerning the preservation of hash values.

I have a hashmap keyed by Scala objects that directly inherit the hashCode() and equals() methods from Any. (These objects are only used to address values in the hashmap mentioned above; they aren't used as keys in any Flink operation, such as groupby, for instance.)

I use a collection of such object keys in a MapReduce program. Unfortunately, during the Flink map and reduce phases the objects change their hash codes and become inconsistent with the keys of the original hashmap. However, I need them to conserve their hash values, s.t. they can still be used as keys.

The objects/classes discussed above are part of a 3^rd party library. Visibility issues prevent me from simply extending them to override the hashCode() and equals() methods. Currently, I work with a cloned version of that library in which I extend the corresponding class in the following manner:

class Key {

final val hash = System.identityHashCode(this) // cache the hash value

override def hashCode(): Int = {

this.hash

}

...// original code

}

First experiments suggest that the hash values are indeed preserved during MapReduce. However, hacking the library is a very clumsy approach.

My question now is: does Flink provide a more elegant solution?

Thanks for your help!

Regards,

Robert Schwarzenberg

Maximilian Michels

Re: hash preservation

Hi Robert,

>Unfortunately, during the Flink map and reduce phases the objects change
>their hash codes and become inconsistent with the keys of the original hashmap

If objects change their hash code values, then this means they are not
equal anymore. If this is not desired then the implementation of
hashCode() is flawed. You will have to change it or work around by
wrapping it.

Concerning `System.identityHashCode(Object o)`: This will only ever be
the same for objects inside an instance of a JVM. As soon as you
serialize and transfer the object to another JVM (happens during
cluster execution), it will change. When you execute your job locally
this works because it skips serialization when shipping the classes to
the workers. However, later during shuffling the network stack will
perform serialization and the hashCode() value changes even during
local execution.

Cheers,
Max

On Thu, Jul 28, 2016 at 3:21 PM, Robert Schwarzenberg
<[hidden email]> wrote:

> Dear Flink developers,
>
> I have a question concerning the preservation of hash values.
>
> I have a hashmap keyed by Scala objects that directly inherit the hashCode()
> and equals() methods from Any. (These objects are only used to address
> values in the hashmap mentioned above; they aren't used as keys in any Flink
> operation, such as groupby, for instance.)
>
> I use a collection of such object keys in a MapReduce program.
> Unfortunately, during the Flink map and reduce phases the objects change
> their hash codes and become inconsistent with the keys of the original
> hashmap. However, I need them to conserve their hash values, s.t. they can
> still be used as keys.
>
> The objects/classes discussed above are part of a 3rd party library.
> Visibility issues prevent me from simply extending them to override the
> hashCode() and equals() methods. Currently, I work with a cloned version of
> that library in which I extend the corresponding class in the following
> manner:
>
>
> class Key {
>
> final val hash = System.identityHashCode(this) // cache the hash value
>
> override def hashCode(): Int = {
>
> this.hash
>
> }
>
> ...// original code
>
> }
>
>
> First experiments suggest that the hash values are indeed preserved during
> MapReduce. However, hacking the library is a very clumsy approach.
>
> My question now is: does Flink provide a more elegant solution?
>
>
> Thanks for your help!
>
>
> Regards,
>
> Robert Schwarzenberg