Hi all, I am trying to understand the situation with state serialization in flink. I'm looking at a number of sources, but slide 35 from here crystalizes my confusion: So, I understand that if 'Flink's own serialization stack' is unable to serialize a type you define, then it will fall back on Kryo generics. In this case, I believe what I'm being told is that state compatibility is difficult to ensure, and schema evolution in your jobs is not possible. However on this slide, they say " Kryo is generally not recommended ... Serialization frameworks with schema evolution support is recommended: Avro, Thrift " So is this implying that Flink's non-default serialization stack does not support schema evolution? In this case is it best practice to register custom serializers whenever possible. Thanks Grab is hiring. Learn more at https://grab.careers By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email and notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law. |
Hi, Yes, if Flink does not recognize your registered state type, it will by default use Kryo for the serialization. And generally speaking, Kryo does not have good support for evolvable schemas compared to other serialization frameworks such as Avro or Protobuf. The reason why Flink defaults to Kryo for unrecognizable types has some historical reasons due to the original use of Flink's type serialization stack being used on the batch side, but IMO the short answer is that it would make sense to have a different default serializer (perhaps Avro) for snapshotting state in streaming programs. However, I believe this would be better suited as a separate discussion thread. The good news is that with Flink 1.7, state schema evolution is fully supported out of the box for Avro types, such as GenericRecord or code generated SpecificRecords. If you want to have evolvable schema for your state types, then it is recommended to use Avro as state types. Support for evolving schema of other data types such as POJOs and Scala case classes is also on the radar for future releases. Does this help answer your question? By the way, the slides your are looking at I would consider quite outdated for the topic, since Flink 1.7 was released with much smoother support for state schema evolution. An updated version of the slides is not yet publicly available, but if you want I can send you one privately. Otherwise, the Flink docs for 1.7 would also be equally helpful. Cheers, Gordon On Fri, Dec 21, 2018, 8:11 PM Padarn Wilson <[hidden email] wrote:
|
For the documents I would recommend reading through: On Fri, Dec 21, 2018, 9:55 PM Tzu-Li (Gordon) Tai <[hidden email] wrote:
|
Yes that helps a lot! Just to clarify: - If using Avro types in 1.7, no explicit declaration of serializers needs to be done to have state evolution. But all other evolvable types (e.g Protobuf) still need to be registered and evolved manually? - If specifying `disableGenericTypes` on my execution context, anything that falls back to Kryo will cause an error. Would love to see more updated slides if you don't mind. Thanks for taking the time, Padarn On Fri, Dec 21, 2018 at 10:04 PM Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Grab is hiring. Learn more at https://grab.careers By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email and notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law. |
1. Correct. Under the hood, evolvability of schema relies on the type's serializer implementation to support it. In Flink 1.7, this had been done only for Avro's Flink built-in serializer (i.e. the AvroSerializer class) for now, so you don't need to provide a custom serializer for this. For any other types, that would be required for now; again, how to implement a custom serializer that works for schema evolution is covered in the documents. 2. Yes, disabling generic types will let the job fail if any data type is determined to be serialized by Kryo, let it be for on-wire data transmission or for state serialization. I'm currently still traveling because of the recent Flink Forward event; will send you a copy of the latest slides I presented about the topic once I get back. Cheers, Gordon On Fri, Dec 21, 2018, 10:42 PM Padarn Wilson <[hidden email] wrote:
|
Thanks for the clarification, that is clear now. Look forward to seeing your slides, safe travels. On Sat, Dec 22, 2018 at 8:25 AM Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Grab is hiring. Learn more at https://grab.careers By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email and notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law. |
Free forum by Nabble | Edit this page |