It seems that Flink kafka consumer don't honor group.id while consumers are
added dynamically. Lets say I have some flink kafka consumers reading from a topic and I dynamically add some new Flink kafka consumers with same group.id, kafka messages are getting duplicated to existing as well as new consumers. It violates the exactly once semantics as all the consumers belong to same consumer group. I am of the view that its an existing issue. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-group-question-td8185.html#none> https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working <https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working> Is there any plan to provide the fix/support for group.id in upcoming releases? Let me know if there is any way to deal with it for now. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Giriraj, The fact that the Flink Kafka Consumer doesn't use the group.id property, is an expected behavior. Since the partition-to-subtask assignment of the Flink Kafka Consumer needs to be deterministic in Flink, the consumer uses static assignment instead of the more high-level consumer group dynamic assignments. If I understood you correctly, what you are doing is, while a job with a Kafka consumer is already running, you want to start a new job also with a Kafka consumer as the source and uses the same group.id so that the topic's messages are routed between the two jobs. Is this correct? If so, could you briefly explain what your use case is and why you want to do this? Perhaps this should be tackled from a different angle when designed in Flink. Cheers, Gordon On Thu, Jun 28, 2018 at 8:26 PM Giriraj <[hidden email]> wrote: It seems that Flink kafka consumer don't honor group.id while consumers are |
Hi Gordon,
Gordon:If I understood you correctly, what you are doing is, while a job with a Kafka consumer is already running, you want to start a new job also with a Kafka consumer as the source and uses the same group.id so that the topic's messages are routed between the two jobs. Is this correct? If so, could you briefly explain what your use case is and why you want to do this? Giriraj: You got it almost correctly. The so called "new job" above is like a new instance of the same job. We are trying to scale the job by spawning more instances of same job. i.e. we are abstracting the flink job as a microservice and when load increases on this service/job, we would like to spawn a new instance of same job/service. That is why we are expecting when group.id is same in both the jobs, messages should get delivered only to one of the job consumers. I have also given thought of scaling job by increasing parallelism(canceling and starting job with increased parallelism). But former approach looked cleaner, intuitive and seamless to us. I would really appreciate your thoughts about it. Apology for the delay in response. -- Giriraj -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |