Hi folks, I am setting up a Flink cluster for testing, and I have a few questions regarding memory allocations:
The planned layout of the standalone cluster is:
Thanks! Avihai |
Hi Avihai, 1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very good performance. 2. Yes, I would recommend to run one large Taskmanager per machine, because you save on "management overhead" and you benefit from faster data transfers locally. 3. If you give your Taskmanagers say 10 GB of heap, its likely that the process in the OS is using ~12 GB of memory in total (our network stack is also using some offheap memory). You can fine-tune the (memory) behavior of Rocks, but by default its not using a lot of memory. 4. I would give it at least 2 GB, if you run multiple jobs or larger jobs (high parallelism, many machines, many tasks), than maybe even more. The layout of the standalone cluster looks good. Where are you planning to write the state checkpoints to? Given that you have 500 Gb of state, you should consider how you want to store that state somewhere reliably. For larger states, its recommended to have a good network connection between the workers (machines running TMs) and the distributed file system (say S3, HDFS, ...). On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <[hidden email]> wrote:
|
Thank you for the answers. The cluster will run in Azure, so I will be using HDFS over Azure Blob Store, as outlined in
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Azure-Blob-Storage-Connector-td8536.html I got pretty good performance in my tests, and it should handle scaling well. We will see how it performs under real production loads. From: Robert Metzger [mailto:[hidden email]] Hi Avihai, 1. As much as possible (I would leave the operating system at least 1 GB of memory). It depends also on the workload you have. For streaming workload with very small state, you can use Flink with 1-2 GB of heap space and still get very
good performance. 2. Yes, I would recommend to run one large Taskmanager per machine, because you save on "management overhead" and you benefit from faster data transfers locally. 3. If you give your Taskmanagers say 10 GB of heap, its likely that the process in the OS is using ~12 GB of memory in total (our network stack is also using some offheap memory). You can fine-tune the (memory) behavior of Rocks, but by
default its not using a lot of memory. 4. I would give it at least 2 GB, if you run multiple jobs or larger jobs (high parallelism, many machines, many tasks), than maybe even more. The layout of the standalone cluster looks good. Where are you planning to write the state checkpoints to? Given that you have 500 Gb of state, you should consider how you want to store that state somewhere reliably. For larger states, its recommended to have a good network connection
between the workers (machines running TMs) and the distributed file system (say S3, HDFS, ...). On Tue, Dec 13, 2016 at 5:41 PM, Avihai Berkovitz <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |