Kafka consumer group id and Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Kafka consumer group id and Flink

Debasish Ghosh
 Hello -

Can someone please point me to some document / code snippet as to how Flink uses Kafka consumer group property "group.id". In the message http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none I see the following ..

Internally, the Flink Kafka connectors don’t use the consumer group management functionality because they are using lower-level APIs (SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each parallel instance for more control on individual partition consumption. So, essentially, the “group.id” setting in the Flink Kafka connector is only used for committing offsets back to ZK / Kafka brokers.

This message is quite old - has anything changed since then? Looks like this property is a mandatory setting though.

If I have multiple flink streaming jobs, since each job tracks the offsets individually and saves it by the internal checkpoint mechanism, is there no need to specify a different groupd.id for each job ? And in the case when there are two jobs reading the same topic but has different business logic will that work correctly although the consumers will be in the same consumer-group?

Thanks for any help.
regards.
--
Reply | Threaded
Open this post in threaded view
|

Re: Kafka consumer group id and Flink

Benchao Li
Hi Debasish,

AFAIK, Flink Kafka Connector still works like that, the community will keep the document updated.

Flink Kafka connector has three main modes (and also specific offsets and specific timestamp):
- earliest-offset: no matter what your offset of "group.id" is currently, it always consumes from the earliest offset.
- group-offset: consumes from offsets kept in kafka of the "group.id" (or zookeeper in older kafka versions)
- latest-offset: no matter what your offset of "group.id" is currently, it always consumes from the latest offset.
And it will automatically commit offsets to the kafka for "group.id" you specified.


Plus the checkpoint, if you enable checkpoint, it always start from the offsets kept in checkpoint, no matter what mode you set.
And will commit offsets to the kafka for "group.id" when checkpoint succeeds by default.

Different jobs won't affect each other for consuming. However, if you specify same "group.id" for different jobs, maybe your offsets save to kafka will be messed.

Debasish Ghosh <[hidden email]> 于2020年2月26日周三 下午9:47写道:
 Hello -

Can someone please point me to some document / code snippet as to how Flink uses Kafka consumer group property "group.id". In the message http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none I see the following ..

Internally, the Flink Kafka connectors don’t use the consumer group management functionality because they are using lower-level APIs (SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each parallel instance for more control on individual partition consumption. So, essentially, the “group.id” setting in the Flink Kafka connector is only used for committing offsets back to ZK / Kafka brokers.

This message is quite old - has anything changed since then? Looks like this property is a mandatory setting though.

If I have multiple flink streaming jobs, since each job tracks the offsets individually and saves it by the internal checkpoint mechanism, is there no need to specify a different groupd.id for each job ? And in the case when there are two jobs reading the same topic but has different business logic will that work correctly although the consumers will be in the same consumer-group?

Thanks for any help.
regards.
--


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]