(DEPRECATED) Apache Flink User Mailing List archive.

FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

Classic

List

Threaded

2 messages Options

sagar loke

FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

Hi,

We have a use case where we are consuming from more than 100s of Kafka Topics. Each topic has different number of partitions.

As per the documentation, to parallelize a Kafka Topic, we need to use setParallelism() == number of Kafka Partitions for a topic.

But if we are consuming multiple topics in Flink by providing pattern eg. my_topic_* and for each topic if there is different configuration for partitions,

then how should we connect all these together so that we can map Kafka Partition to Flink Parallelization correctly and programmatically (so that we don't have to hard code all the topic names and parallelism -- considering we can access kafka topic <-> number of partitions mapping in Flink) ?

Thanks,

Andrey Zagrebin

Re: FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

Hi Sagar,

At the moment number of partitions in Kafka source topics and parallelism of Flink Kafka source operator are completely independent. Flink will internally distribute partitions between a number of source parallel subtasks which you configure. In case of dynamic partition or topic discovery while running it also happens automatically.

Job or source parallelism can be set e.g. to the total number of Kafka partitions over all topics known in advanced, if programmatically then e.g. using Kafka client.

Cheers,

Andrey

On 18 Jul 2018, at 07:54, sagar loke <[hidden email]> wrote:

Hi,

We have a use case where we are consuming from more than 100s of Kafka Topics. Each topic has different number of partitions.

As per the documentation, to parallelize a Kafka Topic, we need to use setParallelism() == number of Kafka Partitions for a topic.

But if we are consuming multiple topics in Flink by providing pattern eg. my_topic_* and for each topic if there is different configuration for partitions,

then how should we connect all these together so that we can map Kafka Partition to Flink Parallelization correctly and programmatically (so that we don't have to hard code all the topic names and parallelism -- considering we can access kafka topic <-> number of partitions mapping in Flink) ?

Thanks,