Standalone cluster layout

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Standalone cluster layout

Avihai Berkovitz

Hi folks,

 

I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:

  1. Is there a recommended limit to the size of a TaskManager heap? I saw that Flink uses G1GC, so we can use dozens of GB.
  2. Following the above question, should I use only one TaskManager process per machine, and give it all the available memory (minus a couple of GB for the OS)?
  3. Should I reserve some memory for RocksDB? The partitioned state will be around 500GB in size, and to my understanding RocksDB runs in native code and so uses off-heap memory.
  4. What is the recommended heap size of a JobManager? I expect that the cluster will run only 2 jobs at the same time.

 

The planned layout of the standalone cluster is:

  • 3 small JobManager machines, running:
    • 1 process of Zookeeper peer
    • 1 JobManager process
  • N large TaskManager machines, each running 1 TM process

 

Thanks!

Avihai

 

Reply | Threaded
Open this post in threaded view
|

Re: Standalone cluster layout

rmetzger0
Hi Avihai,

1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance.
2. Yes, I would recommend to run one large Taskmanager per machine, because you save on "management overhead" and you benefit from faster data transfers locally.
3. If you give your Taskmanagers say 10 GB of heap, its likely that the process in the OS is using ~12 GB of memory in total (our network stack is also using some offheap memory). You can fine-tune the (memory) behavior of Rocks, but by default its not using a lot of memory.

4. I would give it at least 2 GB, if you run multiple jobs or larger jobs (high parallelism, many machines, many tasks), than maybe even more.


The layout of the standalone cluster looks good.
Where are you planning to write the state checkpoints to? Given that you have 500 Gb of state, you should consider how you want to store that state somewhere reliably. For larger states, its recommended to have a good network connection between the workers (machines running TMs) and the distributed file system (say S3, HDFS, ...).



On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <[hidden email]> wrote:

Hi folks,

 

I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:

  1. Is there a recommended limit to the size of a TaskManager heap? I saw that Flink uses G1GC, so we can use dozens of GB.
  2. Following the above question, should I use only one TaskManager process per machine, and give it all the available memory (minus a couple of GB for the OS)?
  3. Should I reserve some memory for RocksDB? The partitioned state will be around 500GB in size, and to my understanding RocksDB runs in native code and so uses off-heap memory.
  4. What is the recommended heap size of a JobManager? I expect that the cluster will run only 2 jobs at the same time.

 

The planned layout of the standalone cluster is:

  • 3 small JobManager machines, running:
    • 1 process of Zookeeper peer
    • 1 JobManager process
  • N large TaskManager machines, each running 1 TM process

 

Thanks!

Avihai

 


Reply | Threaded
Open this post in threaded view
|

RE: Standalone cluster layout

Avihai Berkovitz

Thank you for the answers. The cluster will run in Azure, so I will be using HDFS over Azure Blob Store, as outlined in http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Azure-Blob-Storage-Connector-td8536.html

I got pretty good performance in my tests, and it should handle scaling well. We will see how it performs under real production loads.

 

From: Robert Metzger [mailto:[hidden email]]
Sent: Wednesday, December 14, 2016 4:59 PM
To: [hidden email]
Subject: Re: Standalone cluster layout

 

Hi Avihai,

 

1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance.

2. Yes, I would recommend to run one large Taskmanager per machine, because you save on "management overhead" and you benefit from faster data transfers locally.

3. If you give your Taskmanagers say 10 GB of heap, its likely that the process in the OS is using ~12 GB of memory in total (our network stack is also using some offheap memory). You can fine-tune the (memory) behavior of Rocks, but by default its not using a lot of memory.

 

4. I would give it at least 2 GB, if you run multiple jobs or larger jobs (high parallelism, many machines, many tasks), than maybe even more.

 

 

The layout of the standalone cluster looks good.

Where are you planning to write the state checkpoints to? Given that you have 500 Gb of state, you should consider how you want to store that state somewhere reliably. For larger states, its recommended to have a good network connection between the workers (machines running TMs) and the distributed file system (say S3, HDFS, ...).

 

 

 

On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <[hidden email]> wrote:

Hi folks,

 

I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:

  1. Is there a recommended limit to the size of a TaskManager heap? I saw that Flink uses G1GC, so we can use dozens of GB.
  2. Following the above question, should I use only one TaskManager process per machine, and give it all the available memory (minus a couple of GB for the OS)?
  3. Should I reserve some memory for RocksDB? The partitioned state will be around 500GB in size, and to my understanding RocksDB runs in native code and so uses off-heap memory.
  4. What is the recommended heap size of a JobManager? I expect that the cluster will run only 2 jobs at the same time.

 

The planned layout of the standalone cluster is:

  • 3 small JobManager machines, running:
    • 1 process of Zookeeper peer
    • 1 JobManager process
  • N large TaskManager machines, each running 1 TM process

 

Thanks!

Avihai