Task-manager kubernetes pods take a long time to terminate

Posted by Li Peng-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Task-manager-kubernetes-pods-take-a-long-time-to-terminate-tp32479.html

Hey folks, I'm deploying a Flink cluster via kubernetes, and starting each task manager with taskmanager.sh. I noticed that when I tell kubectl to delete the deployment, the job-manager pod usually terminates very quickly, but any task-manager that doesn't get terminated before the job-manager, usually gets stuck in this loop:

2020-01-29 09:18:47,867 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@job-manager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@job-manager:6123/user/resourcemanager

It then does this for about 10 minutes(?), and then shuts down. If I'm deploying a new cluster, this pod will try to register itself with the new job manager before terminating lter. This isn't a troubling issue as far as I can tell, but I find it annoying that I sometimes have to force delete the pods. 

Any easy ways to just have the task managers terminate gracefully and quickly?

Thanks,
Li