Dynamic rescaling in Flink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dynamic rescaling in Flink

Prasanna kumar
Hi all,

Does flink support dynamic scaling. Say try to add/reduce nodes based upon incoming load.

Because our use case is such that we get peak loads for 4 hours and then medium loads for 8 hours and then light to no load for rest 2 hours.

Or peak load would be atleast 5 times the medium load. 

Has anyone used flink in these type of scenario? We are looking at flink for it's low latency performance. 

Earlier I worked with Spark+YARN which provides a features to dynamicaly add/reduce executors.

Wanted to know the same on flink.

Thanks,
Prasanna
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic rescaling in Flink

Xintong Song
Hi Prasanna,

Flink does not support dynamic rescaling at the moment.

AFAIK, there are some companies in China already have solutions for dynamic scaling Flink jobs (Alibaba, 360, etc.), but none of them are yet available to the community version. These solutions rely on an external system to monitor the workload and rescale the job accordingly. In case of rescaling, it requires a full stop of the data processing, then rescale, then recover from the most recent checkpoint.

The Flink community is also preparing a declarative resource management approach, which should allow the job to dynamically adapt to the available resources (e.g., add/reduce pods on kubernetes). AFAIK, this is still in the design discussion.

Thank you~

Xintong Song



On Wed, Jun 10, 2020 at 2:44 AM Prasanna kumar <[hidden email]> wrote:
Hi all,

Does flink support dynamic scaling. Say try to add/reduce nodes based upon incoming load.

Because our use case is such that we get peak loads for 4 hours and then medium loads for 8 hours and then light to no load for rest 2 hours.

Or peak load would be atleast 5 times the medium load. 

Has anyone used flink in these type of scenario? We are looking at flink for it's low latency performance. 

Earlier I worked with Spark+YARN which provides a features to dynamicaly add/reduce executors.

Wanted to know the same on flink.

Thanks,
Prasanna
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic rescaling in Flink

Prasanna kumar
Thanks Xintong and Yu Yang for the replies, 

I see AWS provides deploying Flink on EMR out of the box. There they have an option of EMR cluster scaling based on the load.

Is this not equal to dynamic rescaling ? 

Screen Shot 2020-06-15 at 9.23.24 AM.png



Let me know your thoughts on the same.

Prasanna.


On Wed, Jun 10, 2020 at 7:33 AM Xintong Song <[hidden email]> wrote:
Hi Prasanna,

Flink does not support dynamic rescaling at the moment.

AFAIK, there are some companies in China already have solutions for dynamic scaling Flink jobs (Alibaba, 360, etc.), but none of them are yet available to the community version. These solutions rely on an external system to monitor the workload and rescale the job accordingly. In case of rescaling, it requires a full stop of the data processing, then rescale, then recover from the most recent checkpoint.

The Flink community is also preparing a declarative resource management approach, which should allow the job to dynamically adapt to the available resources (e.g., add/reduce pods on kubernetes). AFAIK, this is still in the design discussion.

Thank you~

Xintong Song



On Wed, Jun 10, 2020 at 2:44 AM Prasanna kumar <[hidden email]> wrote:
Hi all,

Does flink support dynamic scaling. Say try to add/reduce nodes based upon incoming load.

Because our use case is such that we get peak loads for 4 hours and then medium loads for 8 hours and then light to no load for rest 2 hours.

Or peak load would be atleast 5 times the medium load. 

Has anyone used flink in these type of scenario? We are looking at flink for it's low latency performance. 

Earlier I worked with Spark+YARN which provides a features to dynamicaly add/reduce executors.

Wanted to know the same on flink.

Thanks,
Prasanna
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic rescaling in Flink

Xintong Song
Hi Prasanna,

IIUC, your screenshot shows the scaling feature of an EMR cluster, not Flink.

Let me try to better understand your question. Which kind of rescaling do you need?
- If you deploy a long running streaming job, and want it to dynamically rescale based on the real-time incoming data stream. Flink does not support it at the moment.
- If you have various jobs, and the amount of jobs need to be executed changes along the time, this can be supported in either of the following ways.
  - You can submit your workloads as Flink Single Jobs[1]. In this way, you can simply rescale your EMR cluster, and Flink does not need to be aware of that. 
  - You can deploy a Flink Session[2], and submit your jobs to this session. In this way, Flink will automatically request new workers from and release idle workers to Yarn.

AFAIK, AWS EMR provides out-of-box Flink integration only for the single job mode. The session mode is not supported. But I haven't checked this for quite a while. It could have been changed.

Thank you~

Xintong Song



On Mon, Jun 15, 2020 at 11:55 AM Prasanna kumar <[hidden email]> wrote:
Thanks Xintong and Yu Yang for the replies, 

I see AWS provides deploying Flink on EMR out of the box. There they have an option of EMR cluster scaling based on the load.

Is this not equal to dynamic rescaling ? 

Screen Shot 2020-06-15 at 9.23.24 AM.png



Let me know your thoughts on the same.

Prasanna.


On Wed, Jun 10, 2020 at 7:33 AM Xintong Song <[hidden email]> wrote:
Hi Prasanna,

Flink does not support dynamic rescaling at the moment.

AFAIK, there are some companies in China already have solutions for dynamic scaling Flink jobs (Alibaba, 360, etc.), but none of them are yet available to the community version. These solutions rely on an external system to monitor the workload and rescale the job accordingly. In case of rescaling, it requires a full stop of the data processing, then rescale, then recover from the most recent checkpoint.

The Flink community is also preparing a declarative resource management approach, which should allow the job to dynamically adapt to the available resources (e.g., add/reduce pods on kubernetes). AFAIK, this is still in the design discussion.

Thank you~

Xintong Song



On Wed, Jun 10, 2020 at 2:44 AM Prasanna kumar <[hidden email]> wrote:
Hi all,

Does flink support dynamic scaling. Say try to add/reduce nodes based upon incoming load.

Because our use case is such that we get peak loads for 4 hours and then medium loads for 8 hours and then light to no load for rest 2 hours.

Or peak load would be atleast 5 times the medium load. 

Has anyone used flink in these type of scenario? We are looking at flink for it's low latency performance. 

Earlier I worked with Spark+YARN which provides a features to dynamicaly add/reduce executors.

Wanted to know the same on flink.

Thanks,
Prasanna