Serializers and Schemas

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Serializers and Schemas

Matt
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt

Reply | Threaded
Open this post in threaded view
|

Re: Serializers and Schemas

Luigi Selmi
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful.



Best,

Luigi 

On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt




--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: +49 2241 14-2440

Reply | Threaded
Open this post in threaded view
|

Re: Serializers and Schemas

Matt
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general.


I would like to see a more "generic" approach for the class Product in my last message. I believe a more general purpose de/serializer for POJOs should be possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <[hidden email]> wrote:
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful.



Best,

Luigi 

On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt




--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: <a href="tel:+49%202241%20142440" value="+492241142440" target="_blank">+49 2241 14-2440


Reply | Threaded
Open this post in threaded view
|

Re: Serializers and Schemas

milind parikh

Why not use a self-describing format  (json), stream as String and read through a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class requires storing enough information to dynamically instantiate the actual version of the class; which I believe is beyond the simple use case.

Milind

On Dec 7, 2016 2:42 PM, "Matt" <[hidden email]> wrote:
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general.


I would like to see a more "generic" approach for the class Product in my last message. I believe a more general purpose de/serializer for POJOs should be possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <[hidden email]> wrote:
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful.



Best,

Luigi 

On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt




--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: <a href="tel:+49%202241%20142440" value="+492241142440" target="_blank">+49 2241 14-2440


Reply | Threaded
Open this post in threaded view
|

Re: Serializers and Schemas

Tzu-Li (Gordon) Tai
Hi Matt,

1. There’s some in-progress work on wrapper util classes for Kafka de/serializers here [1] that allows
Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with minimal user overhead.
The PR also has some proposed adds to the documentations for the wrappers.

2. I feel that it would be good to have more documentation on Flink’s de/serializers because they’ve been
frequently asked about on the mailing lists, but at the same time, probably the fastest / efficient de/serialization
approach would be tailored for each use case, so we’d need to think more on the presentation and the purpose
of the documentation.

Cheers,
Gordon


On December 8, 2016 at 5:00:19 AM, milind parikh ([hidden email]) wrote:

Why not use a self-describing format  (json), stream as String and read through a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class requires storing enough information to dynamically instantiate the actual version of the class; which I believe is beyond the simple use case.

Milind

On Dec 7, 2016 2:42 PM, "Matt" <[hidden email]> wrote:
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general.


I would like to see a more "generic" approach for the class Product in my last message. I believe a more general purpose de/serializer for POJOs should be possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <[hidden email]> wrote:
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful.



Best,

Luigi 

On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt




--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: <a href="tel:+49%202241%20142440" value="+492241142440" target="_blank">+49 2241 14-2440


Reply | Threaded
Open this post in threaded view
|

Re: Serializers and Schemas

Matt
Hi people,

This is what I was talking about regarding a generic de/serializer for POJO classes [1].

The Serde class in [2] can be used in both Kafka [3] and Flink [4], and it works out of the box for any POJO class.

Do you see anything wrong in this approach? Any way to improve it?

Cheers,
Matt




On Thu, Dec 8, 2016 at 4:15 AM, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi Matt,

1. There’s some in-progress work on wrapper util classes for Kafka de/serializers here [1] that allows
Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with minimal user overhead.
The PR also has some proposed adds to the documentations for the wrappers.

2. I feel that it would be good to have more documentation on Flink’s de/serializers because they’ve been
frequently asked about on the mailing lists, but at the same time, probably the fastest / efficient de/serialization
approach would be tailored for each use case, so we’d need to think more on the presentation and the purpose
of the documentation.

Cheers,
Gordon


On December 8, 2016 at 5:00:19 AM, milind parikh ([hidden email]) wrote:

Why not use a self-describing format  (json), stream as String and read through a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class requires storing enough information to dynamically instantiate the actual version of the class; which I believe is beyond the simple use case.

Milind

On Dec 7, 2016 2:42 PM, "Matt" <[hidden email]> wrote:
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general.


I would like to see a more "generic" approach for the class Product in my last message. I believe a more general purpose de/serializer for POJOs should be possible to achieve using reflection.

On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <[hidden email]> wrote:
Hi Matt,

I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful.



Best,

Luigi 

On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Hello,

I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important.

So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that.

Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them.

As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer.

1. What do you think about adding some thoughts on this to the documentation?
2. Can anyone provide an example for the following class?

---
public class Product {
    public String code;
    public double price;
    public String description;
    public long created;
}
---

Regards,
Matt




--
Luigi Selmi, M.Sc.
Fraunhofer IAIS Schloss Birlinghoven . 
53757 Sankt Augustin, Germany
Phone: <a href="tel:+49%202241%20142440" value="+492241142440" target="_blank">+49 2241 14-2440