Hi Linlin,
There may be a historic confusion in terminology.
We often refer to 'JobManager' as a component which manages a single job. Names of all related classes usually contain 'JobManager'.
At the same time, we can refer to it as a master process in Flink's cluster, potentially running multiple jobs.
Currently, the master process is a single JVM process (ClusterEntryPoint) running on one machine and it consists of multiple components:
- dispatcher
- resource manager
- job manager per job or one job manager in case of per-job-cluster
All components are designed in a way that they can be separate remote JVM processes, potentially running on different machines, it is just not the case now.
If they are separate remote processes then each of them needs its own leader service to discover each other. This is why it is implemented like you see it.
Best,
Andrey
Hi,
I'm reading flink runtime source code recently, and I found that when create DispatcherResourceManagerComponent in {@link org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create}, it will start DispatcherLeaderRetrieverService and ResourceManagerLeaderRetrieverService which is in order to watch the leader address change.
And my question is - as this piece of code is part of jobmanager starting process which is meant to be the leader, then why does it watch the leader itself?
dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever();
resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever();
Thank