Flink Batch Processing with Kafka

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Batch Processing with Kafka

Alam, Zeeshan

Hi,

 

Flink works very well with Kafka if you wish to stream data. Following  is how I am streaming data with Kafka and Flink.

 

FlinkKafkaConsumer08<Event> kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties);

DataStream<Event> messageStream = env.addSource(kafkaConsumer);     

 

Is there a way to do a micro batch operation on the data coming from Kafka? What I want to do is to reduce or aggregate the events coming from Kafka. For instance I am getting 40000 events per second from Kafka and what I want is to group 2000 events into one and send it to my microservice for further processing. Can I use the Flink DataSet API for this or should I go with Spark or some other framework?

 

Thanks & Regards

Zeeshan Alam

Reply | Threaded
Open this post in threaded view
|

Re: Flink Batch Processing with Kafka

vprabhu@gmail.com
If your environment is not kerberized (or if you can offord to restart the job every 7 days),  a checkpoint enabled, flink job with windowing and the count trigger, would be ideal for your requirement.

Check the api's on flink windows.

I had something like this that worked

stream.keyBy(0).countWindow(2000).apply(function)
                .setParallelism(conf.getInt("window.parallelism"));

where stream is a data stream using the kafka connector,
"function" is where you would have the "send it to my microservice" part.
keyBy(0) is the aggregation based on a key field

You could look up the individual methods in the api.

Thanks,
Prabhu

On Wed, Aug 3, 2016 at 5:21 AM, Alam, Zeeshan <[hidden email]> wrote:

Hi,

 

Flink works very well with Kafka if you wish to stream data. Following  is how I am streaming data with Kafka and Flink.

 

FlinkKafkaConsumer08<Event> kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties);

DataStream<Event> messageStream = env.addSource(kafkaConsumer);     

 

Is there a way to do a micro batch operation on the data coming from Kafka? What I want to do is to reduce or aggregate the events coming from Kafka. For instance I am getting 40000 events per second from Kafka and what I want is to group 2000 events into one and send it to my microservice for further processing. Can I use the Flink DataSet API for this or should I go with Spark or some other framework?

 

Thanks & Regards

Zeeshan Alam