Hello everybody,
I'm currently studying how the Flink/YARN integration works. Right now I'm a little confused about the practical difference in having a long-running session on which one deploys several jobs or deploying these jobs individually. My intuition (which may not be correct) is that in the former ase a pool of resources is allocated to the session and then the usage of these is handled by the Flink JobManager, while in the latter case the resource allocation is handled directly by YARN on a per-job basis. Am I right? If what I said is (more or less) right and apart from security concerns (which have been discussed in a previous thread), are there any further practical differences between having a long-running session or letting YARN handle jobs? Thank you in advance! BR, Stefano Baghino |
Hi Stefano,
Essentially the Yarn Session is not much different from a per-job Yarn cluster. In either case, a Flink cluster is brought up with resources provided by Yarn. In case of the Yarn session this cluster doesn't do anything until a job is submitted. In case of the per-job Yarn cluster, a job is immediately submitted after startup and the cluster is shutdown once that job has been completed. That's all. We're currently working on integrating proper resource allocation into the JobManager. As of now, everything is static, i.e. the JobManager won't allocate more than the initial requested resources. Cheers, Max On Mon, Mar 7, 2016 at 1:38 PM, Stefano Baghino <[hidden email]> wrote: > Hello everybody, > > I'm currently studying how the Flink/YARN integration works. Right now I'm a > little confused about the practical difference in having a long-running > session on which one deploys several jobs or deploying these jobs > individually. > > My intuition (which may not be correct) is that in the former ase a pool of > resources is allocated to the session and then the usage of these is handled > by the Flink JobManager, while in the latter case the resource allocation is > handled directly by YARN on a per-job basis. Am I right? > > If what I said is (more or less) right and apart from security concerns > (which have been discussed in a previous thread), are there any further > practical differences between having a long-running session or letting YARN > handle jobs? > > Thank you in advance! > > -- > BR, > Stefano Baghino > > Software Engineer @ Radicalbit |
Good, thank you for the explanation! On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels <[hidden email]> wrote: Hi Stefano, BR, Stefano Baghino |
One last question: running multiple jobs mean that each one has its own JobManager, right? On Mon, Mar 7, 2016 at 3:14 PM, Stefano Baghino <[hidden email]> wrote:
BR, Stefano Baghino |
For the per-job cluster: Yes, the JobManager is started exclusively for the job.
For the Yarn session: No, the JobManager stays alive during the entire session and may execute one or more jobs (one after another or even at the same time). On Mon, Mar 7, 2016 at 6:37 PM, Stefano Baghino <[hidden email]> wrote: > One last question: running multiple jobs mean that each one has its own > JobManager, right? > > On Mon, Mar 7, 2016 at 3:14 PM, Stefano Baghino > <[hidden email]> wrote: >> >> Good, thank you for the explanation! >> >> On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels <[hidden email]> wrote: >>> >>> Hi Stefano, >>> >>> Essentially the Yarn Session is not much different from a per-job Yarn >>> cluster. In either case, a Flink cluster is brought up with resources >>> provided by Yarn. In case of the Yarn session this cluster doesn't do >>> anything until a job is submitted. In case of the per-job Yarn >>> cluster, a job is immediately submitted after startup and the cluster >>> is shutdown once that job has been completed. That's all. >>> >>> We're currently working on integrating proper resource allocation into >>> the JobManager. As of now, everything is static, i.e. the JobManager >>> won't allocate more than the initial requested resources. >>> >>> Cheers, >>> Max >>> >>> On Mon, Mar 7, 2016 at 1:38 PM, Stefano Baghino >>> <[hidden email]> wrote: >>> > Hello everybody, >>> > >>> > I'm currently studying how the Flink/YARN integration works. Right now >>> > I'm a >>> > little confused about the practical difference in having a long-running >>> > session on which one deploys several jobs or deploying these jobs >>> > individually. >>> > >>> > My intuition (which may not be correct) is that in the former ase a >>> > pool of >>> > resources is allocated to the session and then the usage of these is >>> > handled >>> > by the Flink JobManager, while in the latter case the resource >>> > allocation is >>> > handled directly by YARN on a per-job basis. Am I right? >>> > >>> > If what I said is (more or less) right and apart from security concerns >>> > (which have been discussed in a previous thread), are there any further >>> > practical differences between having a long-running session or letting >>> > YARN >>> > handle jobs? >>> > >>> > Thank you in advance! >>> > >>> > -- >>> > BR, >>> > Stefano Baghino >>> > >>> > Software Engineer @ Radicalbit >> >> >> >> >> -- >> BR, >> Stefano Baghino >> >> Software Engineer @ Radicalbit > > > > > -- > BR, > Stefano Baghino > > Software Engineer @ Radicalbit |
Free forum by Nabble | Edit this page |