Hi Yu,
I am not aware of a way to use the FlinkKafkaConsumer to generate a finite data stream. You could, of course, use a FilterFunction or FlatMapFunction to filter out events outside of the time interval right after the Kafka Source. This way you would not need to modify it, but you have to stop the job manually once no new data is processed.
Generally, I think, there is no way to only read messages from a certain time interval from a Kafka topic (regardless of Flink). So, you would always need to read more events and filter.
Cheers,
Konstantin
Hi,
We are considering to use Flink SQL for ad hoc data analytics on real-time Kafka data, and want to limit the queries to process data in the past 5-10 minutes. To achieve that, one possible approach is to extend the current Kafka connect to have it only read messages in a given period of time to generate a finite DataStream. I am wondering if there is an alternative to this approach. Any suggestions will be very much appreciated.
Regards,
-Yu
--
Konstantin Knauf | Solutions Architect
+49 160 91394525
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen