Questions about high-availability directory

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Questions about high-availability directory

Xinyu Zhang
Hi all

Recently, we use flink with high-availability. We found that there are three kinds of directories in ha.baseDir. They are applicationID/blob, submittedJobGraph and completedcheckpoint. It's used to restore users' jars, submitted job graphs and completed checkpoint. When old Jobmanager is shutdown, the new job manager can recover from these data. My question is,
  1. The data is used to recover jobmanager, and each AM only has one jobmanager, why not put the data  to the directories such as "/applicationid/blob/jobid", "/applicationid/submittedJobGraph/jobid" and "/applicationid/completedcheckpoint/jobid"? 
  2. Is there any method can make sure these data are all cleaned up when a job or a cluster is shutdown?
Thanks!

Xinyu Zhang