I notice https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#rules-for-pojo-types says that all non-transient fields need a setter. That means that the fields cannot be final. That means that the hashCode() should probably just return a constant value (otherwise an object could be mutated and then lost from a hash-based collection. Is it really the case that we have to either register a serializer or abandon immutability and consequently force hashCode to be a constant value? What are the recommended implementation patterns for the POJOs used in a topology Thanks -Stephen |
This question should only be relevant for cases where POJOs are
used as keys, in which case they must not return a
class-constant nor effectively-random value, as this would break
the hash partitioning. This is somewhat alluded to in the keyBy() documentation, but could be clarified. It is in any case heavily discouraged to modify objects after they have been emitted from a function; the mutability of POJOs is hence usually not a problem. On 02/10/2019 14:17, Stephen Connolly
wrote:
|
Hi Stephen, I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this will not have any negative impacts on some other Flink components, but I _suppose_ it should not (someone might correct me if I'm wrong). Jan [1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/ On 10/7/19 1:37 PM, Chesnay Schepler
wrote:
|
The default hashCode implementation is
effectively random and not suited for keys as they may not be
routed to the same instance.
On 07/10/2019 14:54, Jan Lukavský
wrote:
|
Exactly. And that's why it is good for mutable data, because they are not suited for keys either. Jan On 10/7/19 2:58 PM, Chesnay Schepler
wrote:
|
Sorry, but what about immutability in common? Seems like there is no way to have normal immutable chunks inside the stream (but mutable chunks inside stream seem to be some kind of «code smell»). Or I’m just missing something?
Best regards,
Alex
Понедельник, 7 октября 2019, 16:13 +03:00 от Jan Lukavský <[hidden email]>: --
Алексей Протченко |
In reply to this post by Jan Lukavský
Having said that - the same logic applies to using POJO as keys in grouping operations, which heavily rely on hashCode() and equals(). That might suggest, that using mutable objects is not the best option there either. But that might be very much subjective claim. Jan On 10/7/19 3:13 PM, Jan Lukavský wrote:
|
The POJOs that Flink supports follow the Java Bean style, so they are mutable. I agree that direct support for immutable types would be desirable, but in this case, we need to differentiate a bit more. Any mutable object can be effective immutable, if the state is not changed after a certain point. These objects can safely be used as keys in maps. In our case, you can also use mutable objects in Flink for grouping operations etc. In fact, Flink uses defensive copies in some places to actually turn the returned object "immutable". Also see Environment#enableObjectReuse() / disableObjectReuse() > By default, objects are not reused in Flink. Enabling the object reuse mode will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when the user-code function of an operation is not aware of this behavior. Equals/Hashcode should be implemented correctly, ideally generated by your IDE. Best, Arvid On Mon, Oct 7, 2019 at 4:55 PM Jan Lukavský <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |