Hi Experts,
I have a flink cluster (per job mode) running on kubernetes. The job is configured with restart strategy
So after 3 times retry, the job will be marked as FAILED, hence the pods are not running. However, kubernetes will then restart the job again as the available replicas do not match the desired one. I wonder what are the suggestions for such a scenario? How should I configure the flink job running on k8s? Thanks a lot! Eleanore |
Hi Eleanore, how are you deploying Flink exactly? Are you using the application mode with native K8s support to deploy a cluster [1] or are you manually deploying a per-job mode [2]? I believe the problem might be that we terminate the Flink process with a non-zero exit code if the job reaches the ApplicationStatus.FAILED [3]. cc Yang Wang have you observed a similar behavior when running Flink in per-job mode on K8s? On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]> wrote:
|
Hi Till, Thanks for the reply! I manually deploy as per-job mode [1] and I am using Flink 1.8.2. Specifically, I build a custom docker image, which I copied the app jar (not uber jar) and all its dependencies under /flink/lib. So my question is more like, in this case, if the job is marked as FAILED, which causes k8s to restart the pod, this seems not help at all, what are the suggestions for such scenario? Thanks a lot! Eleanore On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]> wrote:
|
Hi Eleanore, I think you are using K8s resource "Job" to deploy the jobmanager. Please set .spec.template.spec.restartPolicy = "Never" and spec.backoffLimit = 0. Refer here[1] for more information. Then, when the jobmanager failed because of any reason, the K8s job will be marked failed. And K8s will not restart the job again. Best, Yang Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
|
[hidden email] I believe that we should rethink the exit codes of Flink. In general you want K8s to restart a failed Flink process. Hence, an application which terminates in state FAILED should not return a non-zero exit code because it is a valid termination state. Cheers, Till On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:
|
[hidden email] In native mode, when a Flink application terminates with FAILED state, all the resources will be cleaned up. However, in standalone mode, I agree with you that we need to rethink the exit code of Flink. When a job exhausts the restart strategy, we should terminate the pod and do not restart again. After googling, it seems that we could not specify the restartPolicy based on exit code[1]. So maybe we need to return a zero exit code to avoid restarting by K8s. Best, Yang Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
|
Hi Yang & Till, Thanks for your prompt reply! Yang, regarding your question, I am actually not using k8s job, as I put my app.jar and its dependencies under flink's lib directory. I have 1 k8s deployment for job manager, and 1 k8s deployment for task manager, and 1 k8s service for job manager. As you mentioned above, if flink job is marked as failed, it will cause the job manager pod to be restarted. Which is not the ideal behavior. Do you suggest that I should change the deployment strategy from using k8s deployment to k8s job? In case the flink program exit with non-zero code (e.g. exhausted number of configured restart), pod can be marked as complete hence not restarting the job again? Thanks a lot! Eleanore On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:
|
Hi Eleanore, Yes, I suggest to use Job to replace Deployment. It could be used to run jobmanager one time and finish after a successful/failed completion. However, using Job still could not solve your problem completely. Just as Till said, When a job exhausts the restart strategy, the jobmanager pod will terminate with non-zero exit code. It will cause the K8s restarting it again. Even though we could set the resartPolicy and backoffLimit, this is not a clean and correct way to go. We should terminate the jobmanager process with zero exit code in such situation. [hidden email] I just have one concern. Is it a special case for K8s deployment? For standalone/Yarn/Mesos, it seems that terminating with non-zero exit code is harmless. Best, Yang Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
|
Yes for the other deployments it is not a problem. A reason why people preferred non-zero exit codes in case of FAILED jobs is that this is easier to monitor than having to take a look at the actual job result. Moreover, in the YARN web UI the application shows as failed if I am not mistaken. However, from a framework's perspective, a FAILED job does not mean that Flink has failed and, hence, the return code could still be 0 in my opinion. Cheers, Till On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:
|
Actually, the application status shows in YARN web UI is not determined by the jobmanager process exit code. Instead, we use "resourceManagerClient.unregisterApplicationMaster" to control the final status of YARN application. So although jobmanager exit with zero code, it still could show failed status in YARN web UI. I have created a ticket to track this improvement[1]. Best, Yang Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:
|
You are right Yang Wang. Thanks for creating this issue. Cheers, Till On Wed, Aug 5, 2020 at 1:33 PM Yang Wang <[hidden email]> wrote:
|
Hi Yang and Till, Thanks a lot for the help! I have the similar question as Till mentioned, if we do not fail Flink pods when the restart strategy is exhausted, it might be hard to monitor such failures. Today I get alerts if the k8s pods are restarted or in crash loop, but if this will no longer be the case, how can we deal with the monitoring? In production, I have hundreds of small flink jobs running (2-8 TM pods) doing stateless processing, it is really hard for us to expose ingress for each JM rest endpoint to periodically query the job status for each flink job. Thanks a lot! Eleanore On Wed, Aug 5, 2020 at 4:56 AM Till Rohrmann <[hidden email]> wrote:
|
Hi Eleanore, From my experience, collecting the Flink metrics to prometheus via metrics collector is a more ideal way. It is also easier to configure the alert. Maybe you could use "fullRestarts" or "numRestarts" to monitor the job restarting. More metrics could be find here[2]. Best, Yang Eleanore Jin <[hidden email]> 于2020年8月5日周三 下午11:52写道:
|
Hi Yang, Thanks a lot for the information! Eleanore On Thu, Aug 6, 2020 at 4:20 AM Yang Wang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |