Flink on YARN - how to resize a running cluster?

Posted by Josh on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-on-YARN-how-to-resize-a-running-cluster-tp7729.html

I'm running a Flink cluster as a YARN application, started by:
./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d

There are 2 worker nodes, so each are allocated 2 task managers. There is a stateful Flink job running on the cluster with a parallelism of 2.

If I now want to increase the number of worker nodes to 3, and add 2 extra task managers, and then increase the job parallelism, how should I do this?

I'm using EMR, so adding an extra worker node and making it available to YARN is easy to do via the AWS console. But I haven't been able to find any information in Flink docs about how to resize a running Flink cluster on YARN. Is it possible to resize it while the YARN application is running, or do I need to stop the YARN application and redeploy the cluster? Also do I need to redeploy my Flink job from a savepoint to increase its parallelism, or do I do this while the job is running?

I tried redeploying the cluster having added a third worker node, via:

> yarn application -kill myflinkcluster

> ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d

(note increasing the number of task managers from 4 to 6)

Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!) and restored my job from the last checkpoint.

Can anyone point me in the right direction?

Thanks,

Josh