All,
I was wondering what the expected default behavior is when same app is deployed in 2 separate clusters but with same group Id. In theory idea was to create active-active across separate clusters but it seems like both apps are getting all the data from Kafka. Anyone else has tried something similar or have an insight on expected behavior? I was expecting to see partial data on both apps and to get all data in one app if other was turned off.
Thanks in advance, - Ashish |
Looks like Flink is using “assign” partitions instead of “subscribe” which will not allow participating in a group if I read the code correctly.
Has anyone solved this type of problem in past of active-active HA across 2 clusters using Kafka?
- Ashish On Wednesday, August 28, 2019, 6:52 PM, ashish pok <[hidden email]> wrote:
|
Hi Ashish, You are right. Flink does not use Kafka based group management. So if you have two clusters consuming the same topic, they will not divide the partitions. The cross cluster HA is not quite possible at this point. It would be good to know the reason you want to have such HA and see if Flink meets you requirement in another way. Thanks, Jiangjie (Becket) Qin On Thu, Aug 29, 2019 at 9:19 PM ashish pok <[hidden email]> wrote:
|
Thanks Becket,
Sorry for delayed response. That’s what I thought as well. I built a hacky custom source today directly using Kafka client which was able to join consumer group etc. which works as I expected but not sure about production readiness for something like that :) The need arises because of (1) Business continuity needs (2) Some of the pipelines we are building are close to network edge and need to run on nodes where we are not allowed to create cluster (yea - let’s not get into that can of security related worms :)). We will get there at some point but for now we are trying to support business continuity on those edge nodes by not actually forming a cluster but using “walled garden” individual Flink server. I fully understand this is not ideal. And all of this started because some of the work we were doing with Logstash needed to be migrated out as Logstash wasn’t able to keep up with data rates unless we put some ridiculous number of servers. In essence, we have pre-approved constraints to connect to Kafka and southbound interfaces using Logstash, which we need to replace for some datasets as they are massive for Logstash to keep up with. Hope that explains a bit where our head is at. Thanks, Ashish
|
Thanks for the explanation Ashish. Glad you made it work with custom source. I guess your application is probably stateless. If so, another option might be having a geo-distributed Flink deployment. That means there will be TM in different datacenter to form a single Flink cluster. This will also come with failover if one of the TM is down. I am not sure if anyone have tried this. It is probably a heavier solution than using Kafka to do the failover, but the good thing is that you may also do some stateful processing if you have a globally accessible storage for the state backup. Thanks, Jiangjie (Becket) Qin On Wed, Sep 4, 2019 at 11:00 AM Ashish Pokharel <[hidden email]> wrote:
|
Thanks - that sounds like a good model to at least explore. We are essentially stateless at this point for this particular need.
- Ashish On Tuesday, September 3, 2019, 11:28 PM, Becket Qin <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |