Question about Reactive mode support

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about Reactive mode support

Sonam Mandal
Hello,

We were going through FlIP-159 and FLIP-160 and found this feature of interest to us for auto-scaling purposes. The limitations indicate that Flink 1.13 will release this for standalone only and for application mode deployments only.

Will this be extended in future releases to other active deployments such as Native Flink on Kubernetes? What about session mode?

Thanks,
Sonam
Reply | Threaded
Open this post in threaded view
|

Re: Question about Reactive mode support

rmetzger0
Hey Sonam,

I'm very happy to hear that you are interested in reactive mode. Your understanding of the limitations for 1.13 is correct. Note that you can deploy standalone Flink on Kubernetes [1]. I'm actually currently preparing a demo for this [2].

We are certainly aware that support for active deployments is a much desired feature. The "problem" with the 1.13 implementation of reactive mode is that it will try to acquire infinite resources from an active resource manager.

For integration with an active deployment, how would you like to control the scaling behavior of Flink? (for example via a REST API call to Flink's JobManager, or via a programmatic scaling policy, or a configured scaling policy? If you prefer a scaling policy, which metric would you like to consider?)

Best,
Sonam

[2] https://github.com/rmetzger/flink-reactive-mode-k8s-demo (attention, this is really work in progress!)




On Wed, Mar 10, 2021 at 5:32 PM Sonam Mandal <[hidden email]> wrote:
Hello,

We were going through FlIP-159 and FLIP-160 and found this feature of interest to us for auto-scaling purposes. The limitations indicate that Flink 1.13 will release this for standalone only and for application mode deployments only.

Will this be extended in future releases to other active deployments such as Native Flink on Kubernetes? What about session mode?

Thanks,
Sonam
Reply | Threaded
Open this post in threaded view
|

Re: Question about Reactive mode support

Sonam Mandal
Hi Robert,

Thanks for getting back to me. We are currently assessing Flink Standalone on Kubernetes and Native Flink on Kubernetes and haven't yet decided on which model we intend to use. We want to ensure that whichever model we choose, we'll be able to get the benefits of the new features added by the community.

>> We are certainly aware that support for active deployments is a much desired feature. The "problem" with the 1.13 implementation of reactive mode is that it will try to acquire infinite resources from an active resource manager.

Good point, thanks for explaining why this is a challenge for active mode. I'm wondering whether it may be helpful to have a min and max parallelism, and the actual parallelism be determined by the scaling policy mentioned next?

>> For integration with an active deployment, how would you like to control the scaling behavior of Flink? (for example via a REST API call to Flink's JobManager, or via a programmatic scaling policy, or a configured scaling policy? If you prefer a scaling policy, which metric would you like to consider?)

In the long term, I think having some kind of pluggable/extensible scaling policy would be best for users to allow flexibility in choosing metrics that are important for their use case. Making it configurable might make it easier to pick and choose different policies if they are available, without needing to make code changes.

Some possible metrics to start with could be related to resource utilization, such as CPU, memory, or other characteristics such as how much the job is lagging? 

Since we are in early stages of just assessing what kind of deployment model we'd like to use, it's hard to say what will work best for us. We just want to see if reactive mode will be available in the future so that we can leverage it when we have more data.

Thanks,
Sonam



From: Robert Metzger <[hidden email]>
Sent: Thursday, March 11, 2021 5:28 AM
To: Sonam Mandal <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Question about Reactive mode support
 
Hey Sonam,

I'm very happy to hear that you are interested in reactive mode. Your understanding of the limitations for 1.13 is correct. Note that you can deploy standalone Flink on Kubernetes [1]. I'm actually currently preparing a demo for this [2].

We are certainly aware that support for active deployments is a much desired feature. The "problem" with the 1.13 implementation of reactive mode is that it will try to acquire infinite resources from an active resource manager.

For integration with an active deployment, how would you like to control the scaling behavior of Flink? (for example via a REST API call to Flink's JobManager, or via a programmatic scaling policy, or a configured scaling policy? If you prefer a scaling policy, which metric would you like to consider?)

Best,
Sonam

[2] https://github.com/rmetzger/flink-reactive-mode-k8s-demo (attention, this is really work in progress!)




On Wed, Mar 10, 2021 at 5:32 PM Sonam Mandal <[hidden email]> wrote:
Hello,

We were going through FlIP-159 and FLIP-160 and found this feature of interest to us for auto-scaling purposes. The limitations indicate that Flink 1.13 will release this for standalone only and for application mode deployments only.

Will this be extended in future releases to other active deployments such as Native Flink on Kubernetes? What about session mode?

Thanks,
Sonam
Reply | Threaded
Open this post in threaded view
|

Re: Question about Reactive mode support

rmetzger0
Hey Sonam,

 I'm wondering whether it may be helpful to have a min and max parallelism, and the actual parallelism be determined by the scaling policy mentioned next?

Yes, that's certainly possible.

Thanks a lot for your input on the design of a scaling policy. Your input is very valuable for scoping the evolution of reactive mode for Flink 1.14.
Once I know more about the planning for the 1.14 release, I can tell you where we are headed. Support for the active resource managers is very high on the list in my opinion.

Best,
Robert


On Thu, Mar 11, 2021 at 6:57 PM Sonam Mandal <[hidden email]> wrote:
Hi Robert,

Thanks for getting back to me. We are currently assessing Flink Standalone on Kubernetes and Native Flink on Kubernetes and haven't yet decided on which model we intend to use. We want to ensure that whichever model we choose, we'll be able to get the benefits of the new features added by the community.

>> We are certainly aware that support for active deployments is a much desired feature. The "problem" with the 1.13 implementation of reactive mode is that it will try to acquire infinite resources from an active resource manager.

Good point, thanks for explaining why this is a challenge for active mode. I'm wondering whether it may be helpful to have a min and max parallelism, and the actual parallelism be determined by the scaling policy mentioned next?

>> For integration with an active deployment, how would you like to control the scaling behavior of Flink? (for example via a REST API call to Flink's JobManager, or via a programmatic scaling policy, or a configured scaling policy? If you prefer a scaling policy, which metric would you like to consider?)

In the long term, I think having some kind of pluggable/extensible scaling policy would be best for users to allow flexibility in choosing metrics that are important for their use case. Making it configurable might make it easier to pick and choose different policies if they are available, without needing to make code changes.

Some possible metrics to start with could be related to resource utilization, such as CPU, memory, or other characteristics such as how much the job is lagging? 

Since we are in early stages of just assessing what kind of deployment model we'd like to use, it's hard to say what will work best for us. We just want to see if reactive mode will be available in the future so that we can leverage it when we have more data.

Thanks,
Sonam



From: Robert Metzger <[hidden email]>
Sent: Thursday, March 11, 2021 5:28 AM
To: Sonam Mandal <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: Question about Reactive mode support
 
Hey Sonam,

I'm very happy to hear that you are interested in reactive mode. Your understanding of the limitations for 1.13 is correct. Note that you can deploy standalone Flink on Kubernetes [1]. I'm actually currently preparing a demo for this [2].

We are certainly aware that support for active deployments is a much desired feature. The "problem" with the 1.13 implementation of reactive mode is that it will try to acquire infinite resources from an active resource manager.

For integration with an active deployment, how would you like to control the scaling behavior of Flink? (for example via a REST API call to Flink's JobManager, or via a programmatic scaling policy, or a configured scaling policy? If you prefer a scaling policy, which metric would you like to consider?)

Best,
Sonam

[2] https://github.com/rmetzger/flink-reactive-mode-k8s-demo (attention, this is really work in progress!)




On Wed, Mar 10, 2021 at 5:32 PM Sonam Mandal <[hidden email]> wrote:
Hello,

We were going through FlIP-159 and FLIP-160 and found this feature of interest to us for auto-scaling purposes. The limitations indicate that Flink 1.13 will release this for standalone only and for application mode deployments only.

Will this be extended in future releases to other active deployments such as Native Flink on Kubernetes? What about session mode?

Thanks,
Sonam