Re: yarn session: one JVM per task

Posted by Gary Yao-5 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/yarn-session-one-JVM-per-task-tp33014p33065.html

Hi David,

Before with the both n and -s it was not the case.

What do you mean by before? At least in 1.8 "-s" could be used to specify the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <[hidden email]> wrote:
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>