Arrays values in keyBy

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Arrays values in keyBy

Elias Levy
I would be useful if the documentation warned what type of equality it expected of values used as keys in keyBy.  I just got bit in the ass by converting a field from a string to a byte array.  All of the sudden the windows were no longer aggregating.  So it seems Flink is not doing a deep compare of arrays when comparing keys.
Reply | Threaded
Open this post in threaded view
|

Re: Arrays values in keyBy

Aljoscha Krettek
Yes, this is correct. Right now we're basically using <key>.hashCode() for keying. (Which can be problematic in some cases.)

Beam, for example, clearly specifies that the encoded form of a value should be used for all comparisons/hashing. This is more well defined but can lead to slow performance in some cases.

On Sat, 11 Jun 2016 at 00:04 Elias Levy <[hidden email]> wrote:
I would be useful if the documentation warned what type of equality it expected of values used as keys in keyBy.  I just got bit in the ass by converting a field from a string to a byte array.  All of the sudden the windows were no longer aggregating.  So it seems Flink is not doing a deep compare of arrays when comparing keys.
Reply | Threaded
Open this post in threaded view
|

Re: Arrays values in keyBy

Ufuk Celebi
Would make sense to update the Javadocs for the next release.

On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <[hidden email]> wrote:

> Yes, this is correct. Right now we're basically using <key>.hashCode() for
> keying. (Which can be problematic in some cases.)
>
> Beam, for example, clearly specifies that the encoded form of a value should
> be used for all comparisons/hashing. This is more well defined but can lead
> to slow performance in some cases.
>
> On Sat, 11 Jun 2016 at 00:04 Elias Levy <[hidden email]> wrote:
>>
>> I would be useful if the documentation warned what type of equality it
>> expected of values used as keys in keyBy.  I just got bit in the ass by
>> converting a field from a string to a byte array.  All of the sudden the
>> windows were no longer aggregating.  So it seems Flink is not doing a deep
>> compare of arrays when comparing keys.
Reply | Threaded
Open this post in threaded view
|

Re: Arrays values in keyBy

Stephan Ewen
I thing we can simply add this behavior when we use the TypeComparator in the keyBy() function. It can implement the hashCode() as a deepHashCode on array types.

On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi <[hidden email]> wrote:
Would make sense to update the Javadocs for the next release.

On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <[hidden email]> wrote:
> Yes, this is correct. Right now we're basically using <key>.hashCode() for
> keying. (Which can be problematic in some cases.)
>
> Beam, for example, clearly specifies that the encoded form of a value should
> be used for all comparisons/hashing. This is more well defined but can lead
> to slow performance in some cases.
>
> On Sat, 11 Jun 2016 at 00:04 Elias Levy <[hidden email]> wrote:
>>
>> I would be useful if the documentation warned what type of equality it
>> expected of values used as keys in keyBy.  I just got bit in the ass by
>> converting a field from a string to a byte array.  All of the sudden the
>> windows were no longer aggregating.  So it seems Flink is not doing a deep
>> compare of arrays when comparing keys.

Reply | Threaded
Open this post in threaded view
|

Re: Arrays values in keyBy

rmetzger0
I've filed a JIRA for this issue: https://issues.apache.org/jira/browse/FLINK-5874

On Wed, Jul 20, 2016 at 4:32 PM, Stephan Ewen <[hidden email]> wrote:
I thing we can simply add this behavior when we use the TypeComparator in the keyBy() function. It can implement the hashCode() as a deepHashCode on array types.

On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi <[hidden email]> wrote:
Would make sense to update the Javadocs for the next release.

On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <[hidden email]> wrote:
> Yes, this is correct. Right now we're basically using <key>.hashCode() for
> keying. (Which can be problematic in some cases.)
>
> Beam, for example, clearly specifies that the encoded form of a value should
> be used for all comparisons/hashing. This is more well defined but can lead
> to slow performance in some cases.
>
> On Sat, 11 Jun 2016 at 00:04 Elias Levy <[hidden email]> wrote:
>>
>> I would be useful if the documentation warned what type of equality it
>> expected of values used as keys in keyBy.  I just got bit in the ass by
>> converting a field from a string to a byte array.  All of the sudden the
>> windows were no longer aggregating.  So it seems Flink is not doing a deep
>> compare of arrays when comparing keys.