Hello,
I don't quite understand how to integrate Kafka and Flink, after a lot of thoughts and hours of reading I feel I'm still missing something important. So far I haven't found a non-trivial but simple example of a stream of a custom class (POJO). It would be good to have such an example in Flink docs, I can think of many many scenarios in which using SimpleStringSchema is not an option, but all Kafka+Flink guides insist on using that. Maybe we can add a simple example to the documentation [1], it would be really helpful for many of us. Also, explaining how to create a Flink De/SerializationSchema from a Kafka De/Serializer would be really useful and would save a lot of time to a lot of people, it's not clear why you need both of them or if you need both of them. As far as I know Avro is a common choice for serialization, but I've read Kryo's performance is much better (true?). I guess though that the fastest serialization approach is writing your own de/serializer. 1. What do you think about adding some thoughts on this to the documentation? 2. Can anyone provide an example for the following class? --- public class Product { public String code; public double price; public String description; public long created; } --- Regards, Matt |
Hi Matt, I had the same problem, trying to read some records in event time using a POJO, doing some transformation and save the result into Kafka for further processing. I am not yet done but maybe the code I wrote starting from the Flink Forward 2016 training docs could be useful. Best, Luigi On 7 December 2016 at 16:35, Matt <[hidden email]> wrote:
Luigi Selmi, M.Sc. Fraunhofer IAIS Schloss Birlinghoven . 53757 Sankt Augustin, Germany Phone: +49 2241 14-2440 |
I've read your example, but I've found the same problem. You're serializing your POJO as a string, where all fields are separated by "\t". This may work for you, but not in general. I would like to see a more "generic" approach for the class Product in my last message. I believe a more general purpose de/serializer for POJOs should be possible to achieve using reflection. On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <[hidden email]> wrote:
|
Why not use a self-describing format (json), stream as String and read through a json reader and avoid top-level reflection? Github.com/milindparikh/streamingsi ? Apologies if I misunderstood the question. But I can quite see how to model your Product class (or indeed POJO) in a fairly generic way ( assumes JSON). The real issues faced when you have different versions of same POJO class requires storing enough information to dynamically instantiate the actual version of the class; which I believe is beyond the simple use case. Milind On Dec 7, 2016 2:42 PM, "Matt" <[hidden email]> wrote:
|
Hi Matt, 1. There’s some in-progress work on wrapper util classes for Kafka de/serializers here [1] that allows Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with minimal user overhead. The PR also has some proposed adds to the documentations for the wrappers. 2. I feel that it would be good to have more documentation on Flink’s de/serializers because they’ve been frequently asked about on the mailing lists, but at the same time, probably the fastest / efficient de/serialization approach would be tailored for each use case, so we’d need to think more on the presentation and the purpose of the documentation. Cheers, Gordon On December 8, 2016 at 5:00:19 AM, milind parikh ([hidden email]) wrote:
|
Hi people, This is what I was talking about regarding a generic de/serializer for POJO classes [1]. The Serde class in [2] can be used in both Kafka [3] and Flink [4], and it works out of the box for any POJO class. Do you see anything wrong in this approach? Any way to improve it? Cheers, Matt On Thu, Dec 8, 2016 at 4:15 AM, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |