FlinkKafkaProducer API

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

FlinkKafkaProducer API

Elias Levy
The FlinkKafkaProducer API seems more difficult to use than it should be.  

The API requires you pass it a SerializationSchema or a KeyedSerializationSchema, but the Kafka producer already has a serialization API.  Requiring a serializer in the Flink API precludes the use of the Kafka serializers.  For instance, they preclude the use of the Confluent KafkaAvroSerializer class that makes use of the Confluent Schema Registry.  Ideally, the serializer would be optional, so as to allow the Kafka producer serializers to handle the task.

In addition, the KeyedSerializationSchema conflates message key extraction with key serialization.  If the serializer were optional, to allow the Kafka producer serializers to take over, you'd still need to extract a key from the message.

And given that the key may not be part of the message you want to write to Kafka, an upstream step may have to package the key with the message to make both available to the sink, for instance in a tuple. That means you also need to define a method to extract the message to write to Kafka from the element passed into the sink by Flink.  

In summary, there should be separation of extraction of the key and message from the element passed into the sink from serialization, and the serialization step should be optional.


Reply | Threaded
Open this post in threaded view
|

Re: FlinkKafkaProducer API

Fabian Hueske-2
Hi Elias,

thanks for your feedback. I think those are good observations and suggestions to improve the Kafka producers.
The best place to discuss such improvements is the dev mailing list.

Would like to repost your mail there or open JIRAs where the discussion about these changes can continue?

Thanks, Fabian

2016-06-09 3:58 GMT+02:00 Elias Levy <[hidden email]>:
The FlinkKafkaProducer API seems more difficult to use than it should be.  

The API requires you pass it a SerializationSchema or a KeyedSerializationSchema, but the Kafka producer already has a serialization API.  Requiring a serializer in the Flink API precludes the use of the Kafka serializers.  For instance, they preclude the use of the Confluent KafkaAvroSerializer class that makes use of the Confluent Schema Registry.  Ideally, the serializer would be optional, so as to allow the Kafka producer serializers to handle the task.

In addition, the KeyedSerializationSchema conflates message key extraction with key serialization.  If the serializer were optional, to allow the Kafka producer serializers to take over, you'd still need to extract a key from the message.

And given that the key may not be part of the message you want to write to Kafka, an upstream step may have to package the key with the message to make both available to the sink, for instance in a tuple. That means you also need to define a method to extract the message to write to Kafka from the element passed into the sink by Flink.  

In summary, there should be separation of extraction of the key and message from the element passed into the sink from serialization, and the serialization step should be optional.



Reply | Threaded
Open this post in threaded view
|

Re: FlinkKafkaProducer API

Elias Levy

On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske <[hidden email]> wrote:
thanks for your feedback. I think those are good observations and suggestions to improve the Kafka producers.
The best place to discuss such improvements is the dev mailing list.

Would like to repost your mail there or open JIRAs where the discussion about these changes can continue?

I opened FLINK-4050.  Since the JIRAs are posted to the dev list, I won't cross post.

Cheers,
Elias 


Reply | Threaded
Open this post in threaded view
|

Re: FlinkKafkaProducer API

Fabian Hueske-2
Great, thank you!

2016-06-09 17:38 GMT+02:00 Elias Levy <[hidden email]>:

On Thu, Jun 9, 2016 at 5:16 AM, Fabian Hueske <[hidden email]> wrote:
thanks for your feedback. I think those are good observations and suggestions to improve the Kafka producers.
The best place to discuss such improvements is the dev mailing list.

Would like to repost your mail there or open JIRAs where the discussion about these changes can continue?

I opened FLINK-4050.  Since the JIRAs are posted to the dev list, I won't cross post.

Cheers,
Elias