(DEPRECATED) Apache Flink User Mailing List archive.

Role of Job Manager

Classic

List

Threaded

4 messages Options

Pankaj Chand

Role of Job Manager

I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations.

1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that cluster of machines.

This signifies that Flink's Job Manager is much like Hadoop's Application Manager.

2. The book, "Stream Processing with Apache Flink", writes that, "The Job Manager is the master process that controls the execution of a single application—each application is controlled by a different Job Manager."

This signifies that Flink defaults to one Job Manager per job, and the Job Manager is closely tied to that single job, much like Hadoop's Application Master for each job.

Please let me know which one is correct.

Pankaj

Eduardo Winpenny Tejedor

Re: Role of Job Manager

Hi Pankaj,

I have no experience with Hadoop but from the book I gathered there's one Job Manager per application i.e. per jar (as in the example in the first chapter). This is not to say there's one Job Manager per job. Actually I don't think the word Job is defined in the book, I've seen Task defined, and those do have Task Managers

Hope this is along the right lines

Regards,

Eduardo

On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <[hidden email]> wrote:

I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations.

1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that cluster of machines.

This signifies that Flink's Job Manager is much like Hadoop's Application Manager.

2. The book, "Stream Processing with Apache Flink", writes that, "The Job Manager is the master process that controls the execution of a single application—each application is controlled by a different Job Manager."

This signifies that Flink defaults to one Job Manager per job, and the Job Manager is closely tied to that single job, much like Hadoop's Application Master for each job.

Please let me know which one is correct.

Pankaj

Biao Liu

Re: Role of Job Manager

Hi Pankaj,

That's really a good question. There was a refactor of architecture before[1]. So there might be some descriptions used the outdated concept.

Before refactoring, Job Manager is a centralized role. It controls whole cluster and all jobs which is described in your interpretation 1.

After refactoring, the old Job Manager is separated into several roles, Resource Manager, Dispatcher, new Job Manager, etc. The new Job Manager is responsible for only one job, which is described in your interpretation 2.

So the document you refer to is outdated. Would you mind telling us the URL of this document? I think we should update it to avoid misleading more people.

1. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

Eduardo Winpenny Tejedor <[hidden email]> 于2019年6月19日周三上午1:12写道：

Hi Pankaj,

I have no experience with Hadoop but from the book I gathered there's one Job Manager per application i.e. per jar (as in the example in the first chapter). This is not to say there's one Job Manager per job. Actually I don't think the word Job is defined in the book, I've seen Task defined, and those do have Task Managers

Hope this is along the right lines

Regards,
Eduardo

On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <[hidden email]> wrote:
I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations.

1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that cluster of machines.

This signifies that Flink's Job Manager is much like Hadoop's Application Manager.

2. The book, "Stream Processing with Apache Flink", writes that, "The Job Manager is the master process that controls the execution of a single application—each application is controlled by a different Job Manager."

This signifies that Flink defaults to one Job Manager per job, and the Job Manager is closely tied to that single job, much like Hadoop's Application Master for each job.

Please let me know which one is correct.

Pankaj

Pankaj Chand

Re: Role of Job Manager

Hi Biao,

Thank you for your reply!

Please let me know the url of the updated Flink documentation.

The url of the outdated document is:

https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html

Another page which (tacitly) supports the outdated concept is:

https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html

The website that hosts these pages is also the first result that comes up when you Google Search for "Flink documentation", and it claims it is a stable version. The url is:

https://ci.apache.org/projects/flink/flink-docs-stable/

Again, please let me know the url of the updated Flink documentation.

Thank you Biao and Eduardo!

Pankaj

On Tue, Jun 18, 2019 at 11:49 PM Biao Liu <[hidden email]> wrote:

Hi Pankaj,

That's really a good question. There was a refactor of architecture before[1]. So there might be some descriptions used the outdated concept.

Before refactoring, Job Manager is a centralized role. It controls whole cluster and all jobs which is described in your interpretation 1.

After refactoring, the old Job Manager is separated into several roles, Resource Manager, Dispatcher, new Job Manager, etc. The new Job Manager is responsible for only one job, which is described in your interpretation 2.

So the document you refer to is outdated. Would you mind telling us the URL of this document? I think we should update it to avoid misleading more people.

1. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

Eduardo Winpenny Tejedor <[hidden email]> 于2019年6月19日周三上午1:12写道：
Hi Pankaj,

I have no experience with Hadoop but from the book I gathered there's one Job Manager per application i.e. per jar (as in the example in the first chapter). This is not to say there's one Job Manager per job. Actually I don't think the word Job is defined in the book, I've seen Task defined, and those do have Task Managers

Hope this is along the right lines

Regards,
Eduardo

On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <[hidden email]> wrote:
I am trying to understand the role of Job Manager in Flink, and have come across two possibly distinct interpretations.

1. The online documentation v1.8 signifies that there is at least one Job Manager in a cluster, and it is closely tied to the cluster of machines, by managing all jobs in that cluster of machines.

This signifies that Flink's Job Manager is much like Hadoop's Application Manager.

2. The book, "Stream Processing with Apache Flink", writes that, "The Job Manager is the master process that controls the execution of a single application—each application is controlled by a different Job Manager."

This signifies that Flink defaults to one Job Manager per job, and the Job Manager is closely tied to that single job, much like Hadoop's Application Master for each job.

Please let me know which one is correct.

Pankaj