yarn session: one JVM per task

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

yarn session: one JVM per task

aldu29
Hi,
My app is based on a lib that is not thread safe (yet...).
In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
Context: I use one Yarn session and send my Flink jobs to this session

Regards,
David
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

Xintong Song
Hi David,

In general, I don't think you can control all parallel subtasks of a certain task run in the same JVM process with the current Flink.

If you job scale is very small, one thing you might try is to have only one task manager in the Flink session cluster. You need to make sure the task manager has enough cpu/memory resources and slots for running your job.

Thank you~

Xintong Song



On Sun, Feb 23, 2020 at 1:11 AM David Morin <[hidden email]> wrote:
Hi,
My app is based on a lib that is not thread safe (yet...).
In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
Context: I use one Yarn session and send my Flink jobs to this session

Regards,
David
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

aldu29
In reply to this post by aldu29
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

Xintong Song
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

aldu29
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

Gary Yao-5
Hi David,

Before with the both n and -s it was not the case.

What do you mean by before? At least in 1.8 "-s" could be used to specify the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <[hidden email]> wrote:
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

aldu29
Hi Gary,

Sorry I was probably not very clear.
Yes that's exactly what I want to hear :)
I use the -s 1 parameter and what I expect to have is one task of my Sink (one instance in fact) per TM (i.e. per JVM)
That's the current behaviour during my tests but I want to be sure.
Thanks a lot

David

Le mar. 25 févr. 2020 à 11:16, Gary Yao <[hidden email]> a écrit :
Hi David,

Before with the both n and -s it was not the case.

What do you mean by before? At least in 1.8 "-s" could be used to specify the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <[hidden email]> wrote:
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

Xintong Song
Ah, I misunderstood and thought that you want to keep all your Sink instances on the same TM.

If what you want is to have one instance per TM, then as Gary mentioned specifying "-s 1" at starting the session would be enough, and it should work with all existing versions above (including) 1.8.

Thank you~

Xintong Song



On Tue, Feb 25, 2020 at 7:41 PM David Morin <[hidden email]> wrote:
Hi Gary,

Sorry I was probably not very clear.
Yes that's exactly what I want to hear :)
I use the -s 1 parameter and what I expect to have is one task of my Sink (one instance in fact) per TM (i.e. per JVM)
That's the current behaviour during my tests but I want to be sure.
Thanks a lot

David

Le mar. 25 févr. 2020 à 11:16, Gary Yao <[hidden email]> a écrit :
Hi David,

Before with the both n and -s it was not the case.

What do you mean by before? At least in 1.8 "-s" could be used to specify the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <[hidden email]> wrote:
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>
Reply | Threaded
Open this post in threaded view
|

Re: yarn session: one JVM per task

aldu29
Perfect.
No problem. My Bad. Not really clear.
Thanks !

Le mar. 25 févr. 2020 à 13:45, Xintong Song <[hidden email]> a écrit :
Ah, I misunderstood and thought that you want to keep all your Sink instances on the same TM.

If what you want is to have one instance per TM, then as Gary mentioned specifying "-s 1" at starting the session would be enough, and it should work with all existing versions above (including) 1.8.

Thank you~

Xintong Song



On Tue, Feb 25, 2020 at 7:41 PM David Morin <[hidden email]> wrote:
Hi Gary,

Sorry I was probably not very clear.
Yes that's exactly what I want to hear :)
I use the -s 1 parameter and what I expect to have is one task of my Sink (one instance in fact) per TM (i.e. per JVM)
That's the current behaviour during my tests but I want to be sure.
Thanks a lot

David

Le mar. 25 févr. 2020 à 11:16, Gary Yao <[hidden email]> a écrit :
Hi David,

Before with the both n and -s it was not the case.

What do you mean by before? At least in 1.8 "-s" could be used to specify the
number of slots per TM.


how can I be sure that my Sink that uses this lib is in one JVM ?

Is it enough that no other parallel instance of your sink runs in the same
JVM? If that is the case, it is enough to start your your YARN session with:

    ./bin/yarn-session.sh -s 1 [...]

This will result in exactly one slot per TM. Note that a single slot may still
hold several subtasks of the job (Slot Sharing) but never two parallel
instances of your sink [2]. You can also control Slot Sharing manually [3].


So, if I understand I have to keep this Flink release (1.9.2) ?

I don't see why 1.10.0 would not work for you.


Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#start-a-session
[2] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html#task-slots-and-resources
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#task-chaining-and-resource-groups

On Tue, Feb 25, 2020 at 10:28 AM David Morin <[hidden email]> wrote:
Hi Xintong,

At the moment I'm using the 1.9.2 with this command:
   yarn-session.sh -d -s 1 -jm 4096 -tm 4096 -qu "XXX" -nm "MyPipeline"
So, after a lot of tests, I've noticed that if I increase the parallelism of my Custom Sink, each task is embedded into one TS and, the most important, each one into one TaskManager (Yarn container in fact).
So, if I understand I have to keep this Flink release (1.9.2) ?

Thanks
David



Le mar. 25 févr. 2020 à 02:02, Xintong Song <[hidden email]> a écrit :
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect.

Anyway, as long as you have multiple containers, I don't think there's a way to make some of the tasks scheduled to the same JVM. Not that I'm aware of.


Thank you~

Xintong Song



On Mon, Feb 24, 2020 at 8:43 PM David Morin <[hidden email]> wrote:
Hi,

Thanks Xintong.
I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case.
I prefer to use only small container with only one task to scale my pipeline and of course to prevent from thread-safe issue
Do you think I cannot be confident on that behaviour ?

Regards,
David

On 2020/02/22 17:11:25, David Morin <[hidden email]> wrote:
> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>