Re: Flink on YARN: long-running session vs. one-off jobs

Posted by Maximilian Michels on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-on-YARN-long-running-session-vs-one-off-jobs-tp5321p5333.html

For the per-job cluster: Yes, the JobManager is started exclusively for the job.

For the Yarn session: No, the JobManager stays alive during the entire
session and may execute one or more jobs (one after another or even at
the same time).

On Mon, Mar 7, 2016 at 6:37 PM, Stefano Baghino
<[hidden email]> wrote:

> One last question: running multiple jobs mean that each one has its own
> JobManager, right?
>
> On Mon, Mar 7, 2016 at 3:14 PM, Stefano Baghino
> <[hidden email]> wrote:
>>
>> Good, thank you for the explanation!
>>
>> On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels <[hidden email]> wrote:
>>>
>>> Hi Stefano,
>>>
>>> Essentially the Yarn Session is not much different from a per-job Yarn
>>> cluster. In either case, a Flink cluster is brought up with resources
>>> provided by Yarn. In case of the Yarn session this cluster doesn't do
>>> anything until a job is submitted. In case of the per-job Yarn
>>> cluster, a job is immediately submitted after startup and the cluster
>>> is shutdown once that job has been completed. That's all.
>>>
>>> We're currently working on integrating proper resource allocation into
>>> the JobManager. As of now, everything is static, i.e. the JobManager
>>> won't allocate more than the initial requested resources.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Mon, Mar 7, 2016 at 1:38 PM, Stefano Baghino
>>> <[hidden email]> wrote:
>>> > Hello everybody,
>>> >
>>> > I'm currently studying how the Flink/YARN integration works. Right now
>>> > I'm a
>>> > little confused about the practical difference in having a long-running
>>> > session on which one deploys several jobs or deploying these jobs
>>> > individually.
>>> >
>>> > My intuition (which may not be correct) is that in the former ase a
>>> > pool of
>>> > resources is allocated to the session and then the usage of these is
>>> > handled
>>> > by the Flink JobManager, while in the latter case the resource
>>> > allocation is
>>> > handled directly by YARN on a per-job basis. Am I right?
>>> >
>>> > If what I said is (more or less) right and apart from security concerns
>>> > (which have been discussed in a previous thread), are there any further
>>> > practical differences between having a long-running session or letting
>>> > YARN
>>> > handle jobs?
>>> >
>>> > Thank you in advance!
>>> >
>>> > --
>>> > BR,
>>> > Stefano Baghino
>>> >
>>> > Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit