TaskManager HA on YARN

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

TaskManager HA on YARN

Marchant, Hayden
Hi,

WE are currently start to test Flink running on YARN. Till now, we've been testing on Standalone Cluster. One thing lacking in standalone is that we have to manually restart a Task Manager if it dies. I looked at https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#yarn-cluster-high-availability , and see that YARN deals with HA for Job Manager. How does it deal with a Task Manager if it dies? I would like the Task Manager to be dealt with similarly to Job Manager on failure. For example, let's say I have a cluster with two Task Managers, and one task manager dies. Will YARN restart the dead Task Manager, or would that need to be a manual restart?

What actually would happen in the above case?

Thanks,
Hayden


Reply | Threaded
Open this post in threaded view
|

Re: TaskManager HA on YARN

Till Rohrmann
Hi Hayden,

in Yarn mode, Flink will tolerate as many TM failures as you have configured `yarn.maximum-failed-containers`. Per default this is set to the initial number of requested TMs. So in your case, the Flink cluster would restart twice a TM and then fail the cluster once a TM fails for the third time.

Cheers,
Till

On Mon, Dec 4, 2017 at 11:31 AM, Marchant, Hayden <[hidden email]> wrote:
Hi,

WE are currently start to test Flink running on YARN. Till now, we've been testing on Standalone Cluster. One thing lacking in standalone is that we have to manually restart a Task Manager if it dies. I looked at https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#yarn-cluster-high-availability , and see that YARN deals with HA for Job Manager. How does it deal with a Task Manager if it dies? I would like the Task Manager to be dealt with similarly to Job Manager on failure. For example, let's say I have a cluster with two Task Managers, and one task manager dies. Will YARN restart the dead Task Manager, or would that need to be a manual restart?

What actually would happen in the above case?

Thanks,
Hayden