Deployment Architecture for Flink Applications

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Deployment Architecture for Flink Applications

Chakravarthy varaga
Hi Team,

    We are analysing different deployment options for managing Flink Jobs on AWS EC2 instances.

     Basically, the options (Resource Manangers) in front of us are using:
     -> Standalone cluster
     -> On YARN
     -> Deploy using Mesos/Marthon
     -> Deploy using Kubernetes/Docker
    
     The Resource Managers options are a bit confusing as we are unable to decide on which one to go with. What we are looking at as inputs to our analysis is:
    ->  Dynamic Scaling of resources
    ->  Resource Allocation
    ->  Jobs Scheduling
    ->  No-Downtime upgrades
    ->  Monitoring & Metrics.

    Right now our plan is to do a paper based study evaluating these options.
 
    I'm sure lot of you guys in production/support would have encountered issues around these. Can someone point out to blogs/research papers/material focussing on the approach taken and the considerations for evaluation?

    Any help here is highly appreciated !

Best Regards
CVP
      
Reply | Threaded
Open this post in threaded view
|

Re: Deployment Architecture for Flink Applications

Kostas Kloudas
Hi CVP,

On how people use Flink, you can check this blogpost to see how Alibaba does it:

In addition, you can also find some more information on the matter on the talks from 
the last Flink Forwards conference: http://berlin.flink-forward.org/program/sessions/

For example Netflix also shares some information here: 

Now for how things work under the hood, I will provide links to the Flink documentation. 
I hope that this will also help you figure out what fits your needs best:

For deployment and operations, the main resource is the Flink documentation, 

and for what is about to come on that front, you can check out the FLIP-6 page:

To dynamically scale your Flink job you have to take a savepoint and restart your job with different parallelism.
You can find some details here https://www.slideshare.net/tillrohrmann/dynamic-scaling-how-apache-flink-adapts-to-changing-workloads , but unfortunately, this talk is a little bit outdated. We will update our documentation on dynamic scaling soon.

For the Resource allocation and Job Scheduling, you can check the links I included for deployment and operations,

and the related pages in the Debugging and monitoring section of the Flink documentation.

I hope this can help as a first step,
Kostas


    Right now our plan is to do a paper based study evaluating these options. 
 
    I'm sure lot of you guys in production/support would have encountered issues around these. Can someone point out to blogs/research papers/material focussing on the approach taken and the considerations for evaluation?

    Any help here is highly appreciated !

Best Regards
CVP
       

On Feb 22, 2017, at 12:30 PM, Chakravarthy varaga <[hidden email]> wrote:

Hi Team,

    We are analysing different deployment options for managing Flink Jobs on AWS EC2 instances.

     Basically, the options (Resource Manangers) in front of us are using:
     -> Standalone cluster
     -> On YARN
     -> Deploy using Mesos/Marthon
     -> Deploy using Kubernetes/Docker
    
     The Resource Managers options are a bit confusing as we are unable to decide on which one to go with. What we are looking at as inputs to our analysis is:
    ->  Dynamic Scaling of resources
    ->  Resource Allocation
    ->  Jobs Scheduling
    ->  No-Downtime upgrades
    ->  Monitoring & Metrics.

    Right now our plan is to do a paper based study evaluating these options.
 
    I'm sure lot of you guys in production/support would have encountered issues around these. Can someone point out to blogs/research papers/material focussing on the approach taken and the considerations for evaluation?

    Any help here is highly appreciated !

Best Regards
CVP