Which mode to choose flink on Yarn.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Which mode to choose flink on Yarn.

weilongxing

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.

Reply | Threaded
Open this post in threaded view
|

Re: Which mode to choose flink on Yarn.

vino yang
Hi weilong,

As you said, there are advantages and disadvantages to each of the two approaches.
However, I hope you know that the "single job" mode has a huge advantage over the "YARN flink session" mode in that it provides job-level isolation (whether JM or TM). 
This will allow the Job to be more fine-grained, and the refactoring of Flink's FLIP-6-based deployment model tends to be "single job" mode. 
But it will start more JM (appmaster) and take up more resources.
But in the end, how to choose also requires you to evaluate and weigh.

Thanks, vino.

weilongxing <[hidden email]> 于2018年9月20日周四 上午10:27写道:

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.

Reply | Threaded
Open this post in threaded view
|

Re: Which mode to choose flink on Yarn.

tison
Hi weilong,

As vino said, the main advantage of per job mode is that it provides job-level isolation, and that of session mode is that it set up a persistent session and accept job, which means the overhead of resource request/setup would loose. In addition, per job mode calculate resource that the job required, while session mode require you config a static config of that persistent session.

As an advice by experience, prefer per job mode for large jobs, and session mode for a serious of small jobs.

Best,
tison.


vino yang <[hidden email]> 于2018年9月20日周四 下午2:17写道:
Hi weilong,

As you said, there are advantages and disadvantages to each of the two approaches.
However, I hope you know that the "single job" mode has a huge advantage over the "YARN flink session" mode in that it provides job-level isolation (whether JM or TM). 
This will allow the Job to be more fine-grained, and the refactoring of Flink's FLIP-6-based deployment model tends to be "single job" mode. 
But it will start more JM (appmaster) and take up more resources.
But in the end, how to choose also requires you to evaluate and weigh.

Thanks, vino.

weilongxing <[hidden email]> 于2018年9月20日周四 上午10:27写道:

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.

Reply | Threaded
Open this post in threaded view
|

Re: Which mode to choose flink on Yarn.

weilongxing
In reply to this post by vino yang
Thanks.

I am wondering whether the job manager will be the bottleneck and how many  jobs could a job manager support in session mode. I tried to find the bottleneck in test environment but failed.


在 2018年9月20日,下午2:16,vino yang <[hidden email]> 写道:

Hi weilong,

As you said, there are advantages and disadvantages to each of the two approaches.
However, I hope you know that the "single job" mode has a huge advantage over the "YARN flink session" mode in that it provides job-level isolation (whether JM or TM). 
This will allow the Job to be more fine-grained, and the refactoring of Flink's FLIP-6-based deployment model tends to be "single job" mode. 
But it will start more JM (appmaster) and take up more resources.
But in the end, how to choose also requires you to evaluate and weigh.

Thanks, vino.

weilongxing <[hidden email]> 于2018年9月20日周四 上午10:27写道:

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.


Reply | Threaded
Open this post in threaded view
|

Re: Which mode to choose flink on Yarn.

tison
That mainly depends on how much parallelism of your job.

The main bottleneck of job manager usually because it is busy to handle rpc requests and gc. At most time you can set larger jm memory to address it by pass `-jm 4096` to `yarn-session.sh start`.

Best,
tison.


weilongxing <[hidden email]> 于2018年9月20日周四 下午2:29写道:
Thanks.

I am wondering whether the job manager will be the bottleneck and how many  jobs could a job manager support in session mode. I tried to find the bottleneck in test environment but failed.


在 2018年9月20日,下午2:16,vino yang <[hidden email]> 写道:

Hi weilong,

As you said, there are advantages and disadvantages to each of the two approaches.
However, I hope you know that the "single job" mode has a huge advantage over the "YARN flink session" mode in that it provides job-level isolation (whether JM or TM). 
This will allow the Job to be more fine-grained, and the refactoring of Flink's FLIP-6-based deployment model tends to be "single job" mode. 
But it will start more JM (appmaster) and take up more resources.
But in the end, how to choose also requires you to evaluate and weigh.

Thanks, vino.

weilongxing <[hidden email]> 于2018年9月20日周四 上午10:27写道:

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.


Reply | Threaded
Open this post in threaded view
|

Re: Which mode to choose flink on Yarn.

weilongxing
In reply to this post by tison
 In addition, per job mode calculate resource that the job required, while session mode require you config a static config of that persistent session.

I tested using flink 1.5.2 and found that session mode can also support dynamic resource. You don’t have to config static config.


I am wondering combine 2 mode together. I can start a session for each project and all jobs in a project submitted to a specific session.


在 2018年9月20日,下午2:23,陈梓立 <[hidden email]> 写道:

Hi weilong,

As vino said, the main advantage of per job mode is that it provides job-level isolation, and that of session mode is that it set up a persistent session and accept job, which means the overhead of resource request/setup would loose. In addition, per job mode calculate resource that the job required, while session mode require you config a static config of that persistent session.

As an advice by experience, prefer per job mode for large jobs, and session mode for a serious of small jobs.

Best,
tison.


vino yang <[hidden email]> 于2018年9月20日周四 下午2:17写道:
Hi weilong,

As you said, there are advantages and disadvantages to each of the two approaches.
However, I hope you know that the "single job" mode has a huge advantage over the "YARN flink session" mode in that it provides job-level isolation (whether JM or TM). 
This will allow the Job to be more fine-grained, and the refactoring of Flink's FLIP-6-based deployment model tends to be "single job" mode. 
But it will start more JM (appmaster) and take up more resources.
But in the end, how to choose also requires you to evaluate and weigh.

Thanks, vino.

weilongxing <[hidden email]> 于2018年9月20日周四 上午10:27写道:

There are two methods to deploy flink applications on yarn. The first one is use yarn-session and all flink applications are deployed in the session. The second method is each flink application deploy on yarn as a yarn application.

My question is what's the difference between these two methods? Which one to choose in product environment?

I can't find any material about this.

I think the first method will save resources since only need one jobmanager(yarn application master).  While it is also the disadvantage since the only jobmanager can be the bottleneck while flink applications getting more and more.