(DEPRECATED) Apache Flink User Mailing List archive.

Flink kafka consumers don't honor group.id

Classic

List

Threaded

3 messages Options

Giriraj

Flink kafka consumers don't honor group.id

It seems that Flink kafka consumer don't honor group.id while consumers are
added dynamically. Lets say I have some flink kafka consumers reading from a
topic and I dynamically add some new Flink kafka consumers with same
group.id, kafka messages are getting duplicated to existing as well as new
consumers. It violates the exactly once semantics as all the consumers
belong to same consumer group.

I am of the view that its an existing issue.
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none>
https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working
<https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working>
Is there any plan to provide the fix/support for group.id in upcoming
releases?
Let me know if there is any way to deal with it for now.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Tzu-Li (Gordon) Tai

Re: Flink kafka consumers don't honor group.id

Hi Giriraj,

The fact that the Flink Kafka Consumer doesn't use the group.id property, is an expected behavior.
Since the partition-to-subtask assignment of the Flink Kafka Consumer needs to be deterministic in Flink, the consumer uses static assignment instead of the more high-level consumer group dynamic assignments.

If I understood you correctly, what you are doing is, while a job with a Kafka consumer is already running, you want to start a new job also with a Kafka consumer as the source and uses the same group.id so that the topic's messages are routed between the two jobs.
Is this correct? If so, could you briefly explain what your use case is and why you want to do this?
Perhaps this should be tackled from a different angle when designed in Flink.

Cheers,
Gordon

On Thu, Jun 28, 2018 at 8:26 PM Giriraj <[hidden email]> wrote:

It seems that Flink kafka consumer don't honor group.id while consumers are
added dynamically. Lets say I have some flink kafka consumers reading from a
topic and I dynamically add some new Flink kafka consumers with same
group.id, kafka messages are getting duplicated to existing as well as new
consumers. It violates the exactly once semantics as all the consumers
belong to same consumer group.

I am of the view that its an existing issue.
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none>
https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working
<https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working>
Is there any plan to provide the fix/support for group.id in upcoming
releases?
Let me know if there is any way to deal with it for now.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Giriraj

Re: Flink kafka consumers don't honor group.id

Hi Gordon,

Gordon:If I understood you correctly, what you are doing is, while a job
with a Kafka consumer is already running, you want to start a new job also
with a Kafka consumer as the source and uses the same group.id so that the
topic's messages are routed between the two jobs.

Is this correct? If so, could you briefly explain what your use case is and
why you want to do this?
Giriraj: You got it almost correctly. The so called "new job" above is like
a new instance of the same job. We are trying to scale the job by spawning
more instances of same job. i.e. we are abstracting the flink job as a
microservice and when load increases on this service/job, we would like to
spawn a new instance of same job/service. That is why we are expecting when
group.id is same in both the jobs, messages should get delivered only to
one of the job consumers. I have also given thought of scaling job by
increasing parallelism(canceling and starting job with increased
parallelism). But former approach looked cleaner, intuitive and seamless
to us.

I would really appreciate your thoughts about it.

Apology for the delay in response.

--
Giriraj

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/