(DEPRECATED) Apache Flink User Mailing List archive.

Dynamic Scaling

Classic

List

Threaded

3 messages Options

Govindarajan Srinivasaraghavan

Dynamic Scaling

Hi,

We have a computation heavy streaming flink job which will be processing around 100 million message at peak time and around 1 million messages in non peak time. We need the capability to dynamically scale so that the computation operator can scale up and down during high or low work loads respectively without restarting the job in order to lower the machine costs.

Is there an ETA on when the rescaling a single operator without restart feature will be released?

Is it possible to auto scale one of the operators with docker swarm or Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If so can I get some documentation or steps to achieve this behaviour.

Also is there any document on what are the tasks of a job manager apart from scheduling and reporting status?

Since there is just one job manager we just wanted to check if there would be any potential scaling limitations as the processing capacity increases.

Thanks

Govind

Govindarajan Srinivasaraghavan

Re: Dynamic Scaling

Hi All,

It would great if someone can help me with my questions. Appreciate all the help.

Thanks.

> On Dec 23, 2016, at 12:11 PM, Govindarajan Srinivasaraghavan <[hidden email]> wrote:
>
> Hi,
>
> We have a computation heavy streaming flink job which will be processing around 100 million message at peak time and around 1 million messages in non peak time. We need the capability to dynamically scale so that the computation operator can scale up and down during high or low work loads respectively without restarting the job in order to lower the machine costs.
>
> Is there an ETA on when the rescaling a single operator without restart feature will be released?
>
> Is it possible to auto scale one of the operators with docker swarm or Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If so can I get some documentation or steps to achieve this behaviour.
>
> Also is there any document on what are the tasks of a job manager apart from scheduling and reporting status?
>
> Since there is just one job manager we just wanted to check if there would be any potential scaling limitations as the processing capacity increases.
>
> Thanks
> Govind
>

Jamie Grier

Re: Dynamic Scaling

In reply to this post by Govindarajan Srinivasaraghavan

Hi Govind,

In Flink 1.2 (feature complete, undergoing test) you will be able to scale your jobs/operators up and down at will, however you'll have to build a little tooling around it yourself and scale based on your own metrics. You should be able to integrate this with Docker Swarm or Amazon auto-scaling groups but I haven't done it myself yet.

The way this will really work is the following sequence:

1) Detect that you want to scale up or down (this part is up to you)

2) Flink Job cancel with Savepoint -- this will shut down with a savepoint in such a way that no messages will need to be re-processed.

3) Launch Flink job from savepoint with different parallelism.

That's it. You should be able to script this such that the whole process takes just a couple of seconds. What I don't have for you right now is any sort of DIRECT integration with Docker or Amazon for scaling. You have to trigger this procedure yourself based on metrics, etc. Do you think this will work for your use case?

On Fri, Dec 23, 2016 at 11:11 AM, Govindarajan Srinivasaraghavan <[hidden email]> wrote:

Hi,

We have a computation heavy streaming flink job which will be processing
around 100 million message at peak time and around 1 million messages in
non peak time. We need the capability to dynamically scale so that the
computation operator can scale up and down during high or low work loads
respectively without restarting the job in order to lower the machine costs.

Is there an ETA on when the rescaling a single operator without restart
feature will be released?

Is it possible to auto scale one of the operators with docker swarm or
Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If
so can I get some documentation or steps to achieve this behaviour.

Also is there any document on what are the tasks of a job manager apart
from scheduling and reporting status?

Since there is just one job manager we just wanted to check if there would
be any potential scaling limitations as the processing capacity increases.

Thanks
Govind

Jamie Grier

data Artisans, Director of Applications Engineering

@jamiegrier

[hidden email]