One TaskManager per node or multiple TaskManager per node

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

One TaskManager per node or multiple TaskManager per node

Ethan Li
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan
Reply | Threaded
Open this post in threaded view
|

Re: One TaskManager per node or multiple TaskManager per node

Jamie Grier-2
There are a lot of different ways to deploy Flink.  It would be easier to answer your question with a little more context about your use case but in general I would advocate the following:

1) Don't run a "permanent" Flink cluster and then submit jobs to it.  Instead what you should do is run an "ephemeral" cluster per job if possible.  This keeps jobs completely isolated from each other which helps a lot with understanding performance, debugging, looking at logs, etc.
2) Given that you can do #1 and you are running on bare metal (as opposed to in containers) then run one TM per physical machine.

There are many ways to accomplish the above depending on your deployment infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give detailed input but general you'll have the best luck if you don't run multiple jobs in the same TM/JVM.

In terms of the TM memory usage you can set that up by configuring it in the flink-conf.yaml file.  The config key you are looking or is taskmanager.heap.size: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size


On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <[hidden email]> wrote:
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan
Reply | Threaded
Open this post in threaded view
|

Re: One TaskManager per node or multiple TaskManager per node

Ethan Li
Thank you Jamie!

Sorry didn’t add more context because it’s mostly a general question without any specific use cases in mind.

We currently deploy flink on bare metal and then submit jobs to it. And it’s how we deploy storm cluster. Looks like we need to move away from this setup for flink. We also have plans to use flink-on-yarn and it should be easier to achieve “ephemeral” setup then. 

But before that happens, is there any easy way to achieve it on bare metal? Thank you very much!


Best,
Ethan



On Jan 14, 2019, at 1:39 PM, Jamie Grier <[hidden email]> wrote:

There are a lot of different ways to deploy Flink.  It would be easier to answer your question with a little more context about your use case but in general I would advocate the following:

1) Don't run a "permanent" Flink cluster and then submit jobs to it.  Instead what you should do is run an "ephemeral" cluster per job if possible.  This keeps jobs completely isolated from each other which helps a lot with understanding performance, debugging, looking at logs, etc.
2) Given that you can do #1 and you are running on bare metal (as opposed to in containers) then run one TM per physical machine.

There are many ways to accomplish the above depending on your deployment infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give detailed input but general you'll have the best luck if you don't run multiple jobs in the same TM/JVM.

In terms of the TM memory usage you can set that up by configuring it in the flink-conf.yaml file.  The config key you are looking or is taskmanager.heap.size: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size


On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <[hidden email]> wrote:
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan

Reply | Threaded
Open this post in threaded view
|

Re: One TaskManager per node or multiple TaskManager per node

bastien dine
In reply to this post by Jamie Grier-2
Hello Jamie,

Does #1 apply to batch jobs too ? 

Regards,

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le lun. 14 janv. 2019 à 20:39, Jamie Grier <[hidden email]> a écrit :
There are a lot of different ways to deploy Flink.  It would be easier to answer your question with a little more context about your use case but in general I would advocate the following:

1) Don't run a "permanent" Flink cluster and then submit jobs to it.  Instead what you should do is run an "ephemeral" cluster per job if possible.  This keeps jobs completely isolated from each other which helps a lot with understanding performance, debugging, looking at logs, etc.
2) Given that you can do #1 and you are running on bare metal (as opposed to in containers) then run one TM per physical machine.

There are many ways to accomplish the above depending on your deployment infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give detailed input but general you'll have the best luck if you don't run multiple jobs in the same TM/JVM.

In terms of the TM memory usage you can set that up by configuring it in the flink-conf.yaml file.  The config key you are looking or is taskmanager.heap.size: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size


On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <[hidden email]> wrote:
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan
Reply | Threaded
Open this post in threaded view
|

Re: One TaskManager per node or multiple TaskManager per node

Jamie Grier-2
Ethan, it depends on what you mean by easy ;)  It just depends a lot on what infra tools you already have in place.  On bare metal it's probably safe to say there is no "easy" way.  You need a lot of automation to make it easy.

Bastien, IMO, #1 applies to batch jobs as well.

On Tue, Jan 15, 2019 at 6:27 AM bastien dine <[hidden email]> wrote:
Hello Jamie,

Does #1 apply to batch jobs too ? 

Regards,

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le lun. 14 janv. 2019 à 20:39, Jamie Grier <[hidden email]> a écrit :
There are a lot of different ways to deploy Flink.  It would be easier to answer your question with a little more context about your use case but in general I would advocate the following:

1) Don't run a "permanent" Flink cluster and then submit jobs to it.  Instead what you should do is run an "ephemeral" cluster per job if possible.  This keeps jobs completely isolated from each other which helps a lot with understanding performance, debugging, looking at logs, etc.
2) Given that you can do #1 and you are running on bare metal (as opposed to in containers) then run one TM per physical machine.

There are many ways to accomplish the above depending on your deployment infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give detailed input but general you'll have the best luck if you don't run multiple jobs in the same TM/JVM.

In terms of the TM memory usage you can set that up by configuring it in the flink-conf.yaml file.  The config key you are looking or is taskmanager.heap.size: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size


On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <[hidden email]> wrote:
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan
Reply | Threaded
Open this post in threaded view
|

Re: One TaskManager per node or multiple TaskManager per node

Ethan Li
It makes sense. Thank you very much, Jamie! 



On Jan 15, 2019, at 12:48 PM, Jamie Grier <[hidden email]> wrote:

Ethan, it depends on what you mean by easy ;)  It just depends a lot on what infra tools you already have in place.  On bare metal it's probably safe to say there is no "easy" way.  You need a lot of automation to make it easy.

Bastien, IMO, #1 applies to batch jobs as well.

On Tue, Jan 15, 2019 at 6:27 AM bastien dine <[hidden email]> wrote:
Hello Jamie,

Does #1 apply to batch jobs too ? 

Regards,

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le lun. 14 janv. 2019 à 20:39, Jamie Grier <[hidden email]> a écrit :
There are a lot of different ways to deploy Flink.  It would be easier to answer your question with a little more context about your use case but in general I would advocate the following:

1) Don't run a "permanent" Flink cluster and then submit jobs to it.  Instead what you should do is run an "ephemeral" cluster per job if possible.  This keeps jobs completely isolated from each other which helps a lot with understanding performance, debugging, looking at logs, etc.
2) Given that you can do #1 and you are running on bare metal (as opposed to in containers) then run one TM per physical machine.

There are many ways to accomplish the above depending on your deployment infrastructure (YARN, K8S, bare metal, VMs, etc) so it's hard to give detailed input but general you'll have the best luck if you don't run multiple jobs in the same TM/JVM.

In terms of the TM memory usage you can set that up by configuring it in the flink-conf.yaml file.  The config key you are looking or is taskmanager.heap.size: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#taskmanager-heap-size


On Mon, Jan 14, 2019 at 8:05 AM Ethan Li <[hidden email]> wrote:
Hello,

I am setting up a standalone flink cluster and I am wondering what’s the best way to distribute TaskManagers.  Do we usually launch one TaskManager (with many slots) per node or multiple TaskManagers per node (with smaller number of slots per tm) ?  Also with one TaskManager per node, I am seeing that TM launches with only 30GB JVM heap by default while the node has 180 GB. Why is it not launching with more memory since there is a lot available?

Thank you very much!

- Ethan