|
Thank you Vino for the information.
Best, Ethan
Hi Ethan,
Share two things:
- I have found "taskmanager.memory.preallocate" config option has been removed in the master codebase.
- After researching git history, I found the description of "taskmanager.memory.preallocate" was written by [hidden email] (from 1.8 branch). So maybe he can give more context or information. Correct me, if I am wrong.
Best, Vino.
I didn’t realize we was not chatting in the mailing list :)
I think it’s wrong because it kind of says full GC is triggered by reaching MaxDirecMemorySize.
Glad that helped. I'm also posting this conversation to the public mailing list, in case other people have similar questions.
And regarding the GC statement, I think the document is correct. - Flink Memory Manager guarantees that the amount of allocated managed memory never exceed the configured capacity, thus managed memory allocation should not trigger OOM. - When preallocation is enabled, managed memory segments are allocated and pooled by Flink Memory Manager, no matter there are tasks requesting them or not. The segments will not be deallocated until the cluster is shutdown. - When preallocation is disabled, managed memory segments are allocated only when tasks requesting them, and destroyed immediately when tasks return them to the Memory Manager. However, what this statement trying to say is that, the memory is not deallocated directly when the memory segment is destroyed, but will have to wait until the GC to be truly released. Thank you very much Xintong! It’s much clear to me now.
I am still on standalone cluster setup. Before I was using 350GB on-heap memory on a 378GB box. I saw a lot of swap activities. Now I understand that it’s because RocksDB didn’t have enough memory to use, so OS forces JVM to swap. It can explain why the cluster was not stable and kept crashing.
Now that I put 150GB off-heap and 150GB on-heap, the cluster is more stable than before. I thought it was because GC was reduced because now we have less heap memory. Now I understand that it’s because I have 78GB memory available for rocksDB to use, 50GB more than before. And it explains why I don’t see swaps anymore.
This makes sense to me now. I just have to set preallocation to false to use the other 150 GB off-heap memory for rocksDB and do some tuning on these memory configs.
If this configuration is set to false cleaning up of the allocated off-heap memory happens only when the configured JVM parameter MaxDirectMemorySize is reached by triggering a full GC
I think this statement is not correct. GC is not trigged by reaching MaxDirectMemorySize. It will throw "java.lang.OutOfMemoryError: Direct buffer memory” if MaxDirectMemorySize is reached.
Thank you again for your help!
Best, Ethan
Hi Ethan,
When you say "it's doing better than before", what is your setups before? Is it on-heap managed memory? With preallocation enabled or disabled? Also, what deployment (standalone, yarn, or local executor) do you run Flink on? It's hard to tell why the performance becomes better without knowing the information above.
Since you are using RocksDB, and configure managed memory to off-heap, you should set pre-allocation to false. Steaming job with RocksDB state backend does not use managed memory at all. Setting managed memory to off-heap only makes Flink to launch JVM with smaller heap space, leaving more space outside JVM. Setting pre-allocation to false makes Flink allocate those managed memory on-demand, and since there's no demand the managed memory will not be allocated. Therefore, the memory space left outside JVM can be fully leveraged by RocksDB.
Regarding related source codes, I would recommend the following: - MemoryManager - For how managed memory is allocated / used. Related to pre-allocation. - ContaineredTaskManagerParameters - For how the JVM memory parameters are decided. Related to on-heap / off-heap managed memory. - TaskManagerServices#fromConfiguration - For how different components are created, as well as how their memory sizes are decided. Also related to on-heap / off-heap managed memory.
Thank you Xintong, Vino for taking your time answering my question. I didn’t know managed memory is only for batch jobs.
Configuring an off-heap state backend like RocksDB means either also setting managed memory to off-heap or adjusting the cutoff ratio, to dedicate less memory to the JVM heap.
We use RocksDB too so I guess I was doing that correctly by accident. So the question here is, in this case, should we set preallocate to true or false?
If set to true, TM will allocate memory off-heap during start up. Will this part of memory being used by RocksDB? If set to false, how is this off-memory memory being managed? Will the allocated memory ever being cleaned up and reused?
I’d really appreciate if you or anyone from the community can share some ideas or point me to the code. I am reading the source code but haven’t got there.
Thank you very much!
Best, Ethan
Hi Ethan,
Currently, managed memory is only used for batch jobs (DataSet / Blink SQL). Setting it to off-heap and enable pre-allocation can improve the performance on using managed memory. However, since you are running streaming jobs which "currently do not use the managed memory", I would suggest you to set managed memory to on-heap and disable pre-allocation. In this way, Flink will not allocate any managed memory segments which are actually not used, and the corresponding memory can still be used for other JVM heap usages.
The above is for Flink 1.9 and earlier. In the upcoming Flink 1.10, we are removing the pre-allocation of managed memory, making managed memory always off-heap, and making rocksdb state backend to use managed memory. Which means the two config options you mentioned will no longer exist in the future releases. In case you're planing to migrate to the upcoming Flink 1.10, if your streaming jobs are using rocksdb state backend, then hopefully it's not necessary for you to change any configuration, but if your jobs are using heap state backend, it would be better to config the managed memory size / fraction to 0 because otherwise the corresponding memory cannot be used by any component.
Thank you~ Xintong Song Hi Community,
We have large memory box so as it suggested we should use off heap memory for flink managed memory. And the doc then suggests to set taskmanager.memory.preallocate to true. However,
"For streaming setups is is highly recommended to set this value to false as the core state backends currently do not use the managed memory."
Our flink set up is mainly for streaming jobs so I think the above applies to our case. So should I use off-heap with “preallocate" setting to false? What would be the impact with these configs?
Thank you very much!
Best, Ethan
|