Output batch to Kafka

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Output batch to Kafka

Oleksandr Nitavskyi

Hello Squirrels,

 

Flink has a wonderful Kafka connector. We need to move data from HDFS to Kafka. Confluent is proposing to use Kafka-connect for this, but probably it can be easier to use Flink for such task, much higher abstraction, less details to manage, easier for our context.

 

Do you know is there a way to output data into the Kafka using the Batch approach?

 

Thanks

Kind Regards

Oleksandr Nitavskyi

Reply | Threaded
Open this post in threaded view
|

Re: Output batch to Kafka

Chesnay Schepler
This depends a little bit on your requirements.
If it just about reading data from HDFS and writing it into Kafka, then it should be possible to simply wrap a KafkaProducer in a RichMapFunction that you use as a sink in your DataSet program.

However you could also use the Streaming API for that.

On 05.06.2018 00:39, Oleksandr Nitavskyi wrote:

Hello Squirrels,

 

Flink has a wonderful Kafka connector. We need to move data from HDFS to Kafka. Confluent is proposing to use Kafka-connect for this, but probably it can be easier to use Flink for such task, much higher abstraction, less details to manage, easier for our context.

 

Do you know is there a way to output data into the Kafka using the Batch approach?

 

Thanks

Kind Regards

Oleksandr Nitavskyi




Reply | Threaded
Open this post in threaded view
|

Re: Output batch to Kafka

Stephan Ewen
You could go with Chesnay's suggestion, which might be the quickest fix.

Creating a KafkaOutputFormat (possibly wrapping the KafkaProducer) would be a bit cleaner. Would be happy to have that as a contribution, actually ;-)

If you care about producing "exactly once" using Kafka Transactions (Kafka 0.11+), it may be a tad bit more involved - please let me know if you want details there.


On Tue, Jun 5, 2018 at 8:10 AM, Chesnay Schepler <[hidden email]> wrote:
This depends a little bit on your requirements.
If it just about reading data from HDFS and writing it into Kafka, then it should be possible to simply wrap a KafkaProducer in a RichMapFunction that you use as a sink in your DataSet program.

However you could also use the Streaming API for that.


On 05.06.2018 00:39, Oleksandr Nitavskyi wrote:

Hello Squirrels,

 

Flink has a wonderful Kafka connector. We need to move data from HDFS to Kafka. Confluent is proposing to use Kafka-connect for this, but probably it can be easier to use Flink for such task, much higher abstraction, less details to manage, easier for our context.

 

Do you know is there a way to output data into the Kafka using the Batch approach?

 

Thanks

Kind Regards

Oleksandr Nitavskyi