(DEPRECATED) Apache Flink User Mailing List archive.

Reading bounded data from Kafka in Flink job

Classic

List

Threaded

2 messages Options

Marchant, Hayden

Reading bounded data from Kafka in Flink job

I have 2 datasets that I need to join together in a Flink batch job. One of the datasets needs to be created dynamically by completely 'draining' a Kafka topic in an offset range (start and end), and create a file containing all messages in that range. I know that in Flink streaming I can specify the start offset, but not the end offset. In my case, this preparation of the file from kafka topic is really working on a finite, bounded set of data, even though it's from Kafka.

Is there a way that I can do this in Flink (either streaming or batch ?

Thanks,
Hayden

Fabian Hueske-2

Re: Reading bounded data from Kafka in Flink job

Hi Hayden,

as far as I know, an end offset is not supported by Flink's Kafka consumer.

You could extend Flink's consumer. As you said, there is already code to set the starting offset (per partition), so you might be able to just piggyback on that.

Gordon (in CC) who has worked a lot on the Kafka connector might have a better idea.

Best, Fabian

2018-02-01 11:42 GMT+01:00 Marchant, Hayden <[hidden email]>:

I have 2 datasets that I need to join together in a Flink batch job. One of the datasets needs to be created dynamically by completely 'draining' a Kafka topic in an offset range (start and end), and create a file containing all messages in that range. I know that in Flink streaming I can specify the start offset, but not the end offset. In my case, this preparation of the file from kafka topic is really working on a finite, bounded set of data, even though it's from Kafka.

Is there a way that I can do this in Flink (either streaming or batch ?

Thanks,
Hayden