Hi all,
Is there any way to increase the sort buffer size other than increasing the overall TaskManager memory? The following error comes up running a job with huge matrix block objects on a cluster: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: The record exceeds the maximum size of a sort buffer (current maximum: 100499456 bytes). Every TM has at least 40 GB of memory while the maximum sort buffer size is at 100 MB. What is the reason for this limit? Sorry if I'm missing something, but I have not found any related discussion or documentation yet. Cheers, Gabor |
Hi Gabor, I don't think there is a way to tune the memory settings for specific operators. [1] https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/Optimizer.java [2] https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/traversals/PlanFinalizer.java 2016-11-16 17:29 GMT+01:00 Gábor Hermann <[hidden email]>: Hi all, |
Hi Fabian, On 2016-11-18 11:02, Fabian Hueske
wrote:
|
Posting an update here, because it came up again:
Have a look at https://issues.apache.org/jira/browse/FLINK-17192 specifically this comment: > There is a hidden/experimental feature in the sorter to offload large records, but it is not active by default. > > You can try and add "taskmanager.runtime.large-record-handler: true" to the config. > > The reason why it is a "hidden feature" is that it has some restrictions: The key must be serializable by Flink's default serializers and > regonized by the TypeExtractor, meaning you cannot use a custom serializer or a specific type information. > > For keys that are string, int, long, arrays, simple POJOs etc. it should be fine. For keys that are Avro types with a specific schema, > or types with custom serializers (including custom Kryo serializers) it might not work. |
Free forum by Nabble | Edit this page |