(DEPRECATED) Apache Flink User Mailing List archive.

Purpose of starting LeaderRetrievalService in DefaultDispatcherResourceManagerComponentFactory#create

Classic

List

Threaded

2 messages Options

Linlin Zhou

Purpose of starting LeaderRetrievalService in DefaultDispatcherResourceManagerComponentFactory#create

Hi,

I'm reading flink runtime source code recently, and I found that when create DispatcherResourceManagerComponent in {@link org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create}, it will start DispatcherLeaderRetrieverService and ResourceManagerLeaderRetrieverService which is in order to watch the leader address change.

And my question is - as this piece of code is part of jobmanager starting process which is meant to be the leader, then why does it watch the leader itself?

dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever();

resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever();

Thank

Andrey Zagrebin-5

Re: Purpose of starting LeaderRetrievalService in DefaultDispatcherResourceManagerComponentFactory#create

Hi Linlin,

There may be a historic confusion in terminology.

We often refer to 'JobManager' as a component which manages a single job. Names of all related classes usually contain 'JobManager'.

At the same time, we can refer to it as a master process in Flink's cluster, potentially running multiple jobs.

Currently, the master process is a single JVM process (ClusterEntryPoint) running on one machine and it consists of multiple components:

- dispatcher

- resource manager

- job manager per job or one job manager in case of per-job-cluster

All components are designed in a way that they can be separate remote JVM processes, potentially running on different machines, it is just not the case now.

If they are separate remote processes then each of them needs its own leader service to discover each other. This is why it is implemented like you see it.

Best,

Andrey

On Wed, Jul 1, 2020 at 4:18 AM Linlin Zhou <[hidden email]> wrote:

Hi,
I'm reading flink runtime source code recently, and I found that when create DispatcherResourceManagerComponent in {@link org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory#create}, it will start DispatcherLeaderRetrieverService and ResourceManagerLeaderRetrieverService which is in order to watch the leader address change.
And my question is - as this piece of code is part of jobmanager starting process which is meant to be the leader, then why does it watch the leader itself?
dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever();

resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever();
Thank