In native k8s application mode, how can I know whether the job is failed or finished?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

In native k8s application mode, how can I know whether the job is failed or finished?

刘逍
Hi,

We are currently using Flink 1.6 standalone mode, but the lack of
isolation is a headache for us. At present, I am trying application mode
of Flink 1.13.0 on native K8s.

I found that as soon as the job ends, whether it ends normally or
abnormally, the jobmanager can no longer be accessed, so the "flink
list" command cannot get the final state of the job.

K8s pod will also be deleted immediately, "kubectl get pod" can only see
"running", "terminating", and then "not found".

The Flink job needs to be managed by our internal scheduling system, so
I need to find a way to let the scheduling system know whether the job
ends normally or abnormally.

Is there any way?
Reply | Threaded
Open this post in threaded view
|

Re: In native k8s application mode, how can I know whether the job is failed or finished?

Xintong Song
There are two ways to access the status of a job after it is finished.

1. You can try native k8s deployment in session mode. When jobs are finished in this mode, TMs will be automatically released after a short period of time, while JM will not be terminated until you explicitly shutdown the session cluster. Thus, status of historical jobs can be accessed via the JM.

2. You can try setting up a history server [1], where information of finished jobs can be archived.

Thank you~

Xintong Song



On Thu, Jun 3, 2021 at 2:46 PM 刘逍 <[hidden email]> wrote:
Hi,

We are currently using Flink 1.6 standalone mode, but the lack of
isolation is a headache for us. At present, I am trying application mode
of Flink 1.13.0 on native K8s.

I found that as soon as the job ends, whether it ends normally or
abnormally, the jobmanager can no longer be accessed, so the "flink
list" command cannot get the final state of the job.

K8s pod will also be deleted immediately, "kubectl get pod" can only see
"running", "terminating", and then "not found".

The Flink job needs to be managed by our internal scheduling system, so
I need to find a way to let the scheduling system know whether the job
ends normally or abnormally.

Is there any way?
Reply | Threaded
Open this post in threaded view
|

Re: In native k8s application mode, how can I know whether the job is failed or finished?

刘逍
Thank you for timely help!

I've tried session mode a little bit, it's better than I thought, the TaskManager can be allocated and de-allocated dynamically. But it seems the memory size of TaskManager is fixed when the session starts, and can not be adjusted for different job.

I'll try to deploy a history server on k8s later...