How to deploy Flink in a geo-distributed environment

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to deploy Flink in a geo-distributed environment

Qian Ye
Hi,
Can Flink be deployed in a geo-distributed environment instead of being in local clusters?  
As far as I know, raw data should be moved to local cloud environment or local clusters before Flink handle it. Consider this situation where data sources are on different areas which might be cross different countries that moving data with wlan is slow and expensive. How to solve this problem? Is there solution for this now?

Thanks. 
Reply | Threaded
Open this post in threaded view
|

Re: How to deploy Flink in a geo-distributed environment

Tzu-Li (Gordon) Tai
Hi,

It should be possible to deploy a single Flink cluster across geo-distributed nodes, but Flink currently offers no optimization for such a specific use case.
AFAIK, the general pattern for dealing with geographically distributed data sources right now, would be to replicate data across clusters, such that they end up within a same target central destination before a processing framework such as Flink handles them.
Good designs for multi-cluster data replication across, say multiple Kafka cluster, would be out of scope of this mailing list, though.
Some quick googling led me to slides such as [1], but I'm sure there's will be more resources out there.

Cheers,
Gordon

[1] https://www.slideshare.net/ConfluentInc/common-patterns-of-multi-datacenter-architectures-with-apache-kafka


On Wed, Jun 27, 2018 at 3:08 AM Stephen <[hidden email]> wrote:
Hi,
Can Flink be deployed in a geo-distributed environment instead of being in local clusters?  
As far as I know, raw data should be moved to local cloud environment or local clusters before Flink handle it. Consider this situation where data sources are on different areas which might be cross different countries that moving data with wlan is slow and expensive. How to solve this problem? Is there solution for this now?

Thanks. 
Reply | Threaded
Open this post in threaded view
|

Re: How to deploy Flink in a geo-distributed environment

Qian Ye
Thanks! 

On Thu, Jun 28, 2018 at 7:32 PM Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi,

It should be possible to deploy a single Flink cluster across geo-distributed nodes, but Flink currently offers no optimization for such a specific use case.
AFAIK, the general pattern for dealing with geographically distributed data sources right now, would be to replicate data across clusters, such that they end up within a same target central destination before a processing framework such as Flink handles them.
Good designs for multi-cluster data replication across, say multiple Kafka cluster, would be out of scope of this mailing list, though.
Some quick googling led me to slides such as [1], but I'm sure there's will be more resources out there.

Cheers,
Gordon

[1] https://www.slideshare.net/ConfluentInc/common-patterns-of-multi-datacenter-architectures-with-apache-kafka


On Wed, Jun 27, 2018 at 3:08 AM Stephen <[hidden email]> wrote:
Hi,
Can Flink be deployed in a geo-distributed environment instead of being in local clusters?  
As far as I know, raw data should be moved to local cloud environment or local clusters before Flink handle it. Consider this situation where data sources are on different areas which might be cross different countries that moving data with wlan is slow and expensive. How to solve this problem? Is there solution for this now?

Thanks.