Flink on Kubernetes, Task/Job Manager Recycles

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on Kubernetes, Task/Job Manager Recycles

Julian Cardarelli (CA)

Hello –

 

I am running some testing with flink on Kubernetes. Every let’s say five to ten days, all the jobs disappear from running jobs. There’s nothing under completed jobs, and there’s no record of the submitted jar files in the cluster.

 

In some manner or another, it is almost like going into a fresh Flink installation. And so, I think that’s probably what is happening.

 

Is there a persistent volume or something that needs to be setup to ensure that state is maintained between what seems like a pod restart? I’m not clear on where to add it based on the docs, if so.

 

Thank you

 

 

 

___
Julian Cardarelli
CEO
700‑184 Front Street East
TorontoONM5A 4N3Canada
T: <a href="tel:(800)%20961-1549" target="_blank" id="LPlnk689713" style="text-decoration:none;color:#000001;">(800) 961-1549
E:[hidden email]
Thentia Website
DISCLAIMER

​Neither Thentia Corporation, nor its directors, officers, shareholders, representatives, employees, non-arms length companies, subsidiaries, parent, affiliated brands and/or agencies are licensed to provide legal advice. This e-mail may contain among other things legal information. We disclaim any and all responsibility for the content of this e-mail. YOU MUST NOT rely on any of our communications as legal advice. Only a licensed legal professional may give you advice. Our communications are never provided as legal advice, because we are not licensed to provide legal advice nor do we possess the knowledge, skills or capacity to provide legal advice. We disclaim any and all responsibility related to any action you might take based upon our communications and emphasize the need for you to never rely on our communications as the basis of any claim or proceeding.    
CONFIDENTIALITY

​This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual(s) named. If you are not the named addressee(s) you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.    
Reply | Threaded
Open this post in threaded view
|

Re: Flink on Kubernetes, Task/Job Manager Recycles

Yang Wang
I think you need to enable the HA(high availability) for your Flink cluster[1]. Currently,
we have the ZooKeeperHAService and KubernetesHAService. In the HA mode,
all the meta data(e.g. job graph path, checkpoint counter, checkpoint path) will be
stored on ZooKeeper or Kubernetes ConfigMap. And the real HA data(e.g. user artifacts,
checkpoint) will be stored on a distributed storage(e.g. HDFS, S3, etc.). Refer here[2] for
more information about how HA works.

Maybe you could also use a persistent volume for the HA data storage. Please note that
all the JobManager and TaskManager need to be mounted with the same PV.


Best,
Yang

Julian Cardarelli (CA) <[hidden email]> 于2021年1月29日周五 上午7:00写道:

Hello –

 

I am running some testing with flink on Kubernetes. Every let’s say five to ten days, all the jobs disappear from running jobs. There’s nothing under completed jobs, and there’s no record of the submitted jar files in the cluster.

 

In some manner or another, it is almost like going into a fresh Flink installation. And so, I think that’s probably what is happening.

 

Is there a persistent volume or something that needs to be setup to ensure that state is maintained between what seems like a pod restart? I’m not clear on where to add it based on the docs, if so.

 

Thank you

 

 

 

___
Julian Cardarelli
CEO
700‑184 Front Street East
TorontoONM5A 4N3Canada
T: <a href="tel:(800)%20961-1549" id="gmail-m_-3101022116816390953LPlnk689713" style="text-decoration:none;color:rgb(0,0,1)" target="_blank">(800) 961-1549
E:[hidden email]
Thentia Website
DISCLAIMER

​Neither Thentia Corporation, nor its directors, officers, shareholders, representatives, employees, non-arms length companies, subsidiaries, parent, affiliated brands and/or agencies are licensed to provide legal advice. This e-mail may contain among other things legal information. We disclaim any and all responsibility for the content of this e-mail. YOU MUST NOT rely on any of our communications as legal advice. Only a licensed legal professional may give you advice. Our communications are never provided as legal advice, because we are not licensed to provide legal advice nor do we possess the knowledge, skills or capacity to provide legal advice. We disclaim any and all responsibility related to any action you might take based upon our communications and emphasize the need for you to never rely on our communications as the basis of any claim or proceeding.    
CONFIDENTIALITY

​This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual(s) named. If you are not the named addressee(s) you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.