Hello, My name is Allen, and I'm currently researching different distributed execution engines. I wanted to run some benchmarks on Flink with a 10-node cluster(each node has 64vCPUs and 376GB memory). I ran the program with parallelism 320 and got an error message: "Caused by: java.io.IOException: Insufficient number of network buffers: required 320, but only 128 available. The total number of network buffers is currently set to 32768 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'." Currently, I set the following parameters: jobmanager.heap.size: 102400m taskmanager.memory.size: 102400m taskmanager.numberOfTaskSlots: 32 taskmanager.network.memory.min: 102400m taskmanager.network.memory.max: 102400m taskmanager.network.memory.fraction: 0.5 (For the last three fields, I've also tried to set taskmanager.network.numberOfBuffers: 40960 directly) Could you please give me some advice about how should I fix it? Thank you so much! Best, Allen
|
Hi Allen, There are two ways for setting network buffers. The old way via `taskmanager.network.numberOfBuffers` is deprecated. The new way is via three parameters min,max and fraction. The specific formula is Math.min(network.memory.max, Math.max(network.memory.min, network.memory.fraction * jvmMemory). If both ways are setting, only the new way works. You can adjust these three parameters accordingly. Also you could check the log of task manager by searching " MB for network buffer pool (number of memory segments: " to confirm whether your setting is working as expected. Best, Zhijiang
|
Free forum by Nabble | Edit this page |