read a finite number of messages from Kafka using Kafka connector without extending it?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

read a finite number of messages from Kafka using Kafka connector without extending it?

Yu Yang
Hi, 

We are considering to use Flink SQL for ad hoc data analytics on real-time Kafka data, and want to limit the queries to process data in the past 5-10 minutes. To achieve that, one possible approach is to extend the current Kafka connect to have it only read messages in a given period of time to generate a finite DataStream. I am wondering if there is an alternative to this approach. Any suggestions will be very much appreciated. 

Regards, 
-Yu


Reply | Threaded
Open this post in threaded view
|

Re: read a finite number of messages from Kafka using Kafka connector without extending it?

Konstantin Knauf-2
Hi Yu,

I am not aware of a way to use the FlinkKafkaConsumer to generate a finite data stream. You could, of course, use a FilterFunction or FlatMapFunction to filter out events outside of the time interval right after the Kafka Source. This way you would not need to modify it, but you have to stop the job manually once no new data is processed.

Generally, I think, there is no way to only read messages from a certain time interval from a Kafka topic (regardless of Flink). So, you would always need to read more events and filter.

Cheers,

Konstantin

On Sat, Feb 16, 2019 at 1:10 AM Yu Yang <[hidden email]> wrote:
Hi, 

We are considering to use Flink SQL for ad hoc data analytics on real-time Kafka data, and want to limit the queries to process data in the past 5-10 minutes. To achieve that, one possible approach is to extend the current Kafka connect to have it only read messages in a given period of time to generate a finite DataStream. I am wondering if there is an alternative to this approach. Any suggestions will be very much appreciated. 

Regards, 
-Yu




--

Konstantin Knauf | Solutions Architect

+49 160 91394525



Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen