POJO serialization vs immutability

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

POJO serialization vs immutability

Stephen Connolly

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen
Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Chesnay Schepler

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen


Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Jan Lukavský

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen


Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Chesnay Schepler
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance.

On 07/10/2019 14:54, Jan Lukavský wrote:

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen



Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Jan Lukavský

Exactly. And that's why it is good for mutable data, because they are not suited for keys either.

Jan

On 10/7/19 2:58 PM, Chesnay Schepler wrote:
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance.

On 07/10/2019 14:54, Jan Lukavský wrote:

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen



Reply | Threaded
Open this post in threaded view
|

Re[2]: POJO serialization vs immutability

Протченко Алексей
Sorry, but what about immutability in common? Seems like there is no way to have normal immutable chunks inside the stream (but mutable chunks inside stream seem to be some kind of «code smell»). Or I’m just missing something?
 
Best regards,
Alex
 
Понедельник, 7 октября 2019, 16:13 +03:00 от Jan Lukavský <[hidden email]>:
 

Exactly. And that's why it is good for mutable data, because they are not suited for keys either.

Jan

On 10/7/19 2:58 PM, Chesnay Schepler wrote:
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance.
 
On 07/10/2019 14:54, Jan Lukavský wrote:

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:
 
That means that the fields cannot be final.
 
That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.
 
Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?
 
What are the recommended implementation patterns for the POJOs used in a topology
 
Thanks
 
-Stephen

 

 

 
 
--
Алексей Протченко
 
Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Jan Lukavský
In reply to this post by Jan Lukavský

Having said that - the same logic applies to using POJO as keys in grouping operations, which heavily rely on hashCode() and equals(). That might suggest, that using mutable objects is not the best option there either. But that might  be very much subjective claim.

Jan

On 10/7/19 3:13 PM, Jan Lukavský wrote:

Exactly. And that's why it is good for mutable data, because they are not suited for keys either.

Jan

On 10/7/19 2:58 PM, Chesnay Schepler wrote:
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance.

On 07/10/2019 14:54, Jan Lukavský wrote:

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen



Reply | Threaded
Open this post in threaded view
|

Re: POJO serialization vs immutability

Arvid Heise-3
The POJOs that Flink supports follow the Java Bean style, so they are mutable.

I agree that direct support for immutable types would be desirable, but in this case, we need to differentiate a bit more.
Any mutable object can be effective immutable, if the state is not changed after a certain point. These objects can safely be used as keys in maps.

In our case, you can also use mutable objects in Flink for grouping operations etc. In fact, Flink uses defensive copies in some places to actually turn the returned object "immutable".
Also see Environment#enableObjectReuse() / disableObjectReuse()
> By default, objects are not reused in Flink. Enabling the object reuse mode will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when the user-code function of an operation is not aware of this behavior.

Equals/Hashcode should be implemented correctly, ideally generated by your IDE.

Best,

Arvid

On Mon, Oct 7, 2019 at 4:55 PM Jan Lukavský <[hidden email]> wrote:

Having said that - the same logic applies to using POJO as keys in grouping operations, which heavily rely on hashCode() and equals(). That might suggest, that using mutable objects is not the best option there either. But that might  be very much subjective claim.

Jan

On 10/7/19 3:13 PM, Jan Lukavský wrote:

Exactly. And that's why it is good for mutable data, because they are not suited for keys either.

Jan

On 10/7/19 2:58 PM, Chesnay Schepler wrote:
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance.

On 07/10/2019 14:54, Jan Lukavský wrote:

Hi Stephen,

I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong).

Jan

[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/

On 10/7/19 1:37 PM, Chesnay Schepler wrote:

This question should only be relevant for cases where POJOs are used as keys, in which case they must not return a class-constant nor effectively-random value, as this would break the hash partitioning.

This is somewhat alluded to in the keyBy() documentation, but could be clarified.

It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem.

On 02/10/2019 14:17, Stephen Connolly wrote:

That means that the fields cannot be final.

That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection.

Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value?

What are the recommended implementation patterns for the POJOs used in a topology

Thanks

-Stephen