OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

Yassine MARZOUGUI
Hi all,

I tried starting a local Flink 1.2.0 cluster using start-local.sh, with the following settings for the taskmanager memory:

taskmanager.heap.mb: 16384
taskmanager.memory.off-heap: true
taskmanager.memory.preallocate: true

That throws and OOM error:
Caused by: java.lang.Exception: OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory (39017161219 bytes). Try increasing the maximum direct memory (-XX:MaxDirectMemorySize)

However If I add an obsolute taskmanager.memory.size:
taskmanager.memory.size: 15360
the cluster starts successfully.

My understanding is that if taskmanager.memory.size is unspecified then it should be equal to 0.7 * taskmanager.heap.mb. So I don't understand why it throws an exception and it works if its larger than that fraction.

Any help is appreciated.

Best,
Yassine

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

Nico Kruber
Hi Yassine,
Thanks for reporting this. The problem you run into is due to start-local.sh
which we discourage in favour of start-cluster.sh that resembles real use case
better.

In your case, start-local.sh starts a job manager with an embedded task
manager but does not parse the task manager config properly to set the right
parameters.
With start-cluster.sh (or manually via jobmanager.sh and taskmanager.sh), job
and task manager are started separately in separate JVMs without this issue.

There's an open pull request to make start-cluster.sh better cooperate in
certain situations in order to replace start-local.sh completely[1] but it
hasn't been merged yet nor is start-local.sh replaced. In the future, we might
do that though.

FYI: I created a Jira issue for us to check further code paths that may lead
to this problem:
https://issues.apache.org/jira/browse/FLINK-5973


Regards
Nico

[1] https://github.com/apache/flink/pull/3298

On Friday, 3 March 2017 17:07:14 CET Yassine MARZOUGUI wrote:

> Hi all,
>
> I tried starting a local Flink 1.2.0 cluster using start-local.sh, with the
> following settings for the taskmanager memory:
>
> taskmanager.heap.mb: 16384
> taskmanager.memory.off-heap: true
> taskmanager.memory.preallocate: true
>
> That throws and OOM error:
> Caused by: java.lang.Exception: OutOfMemory error (Direct buffer memory)
> while allocating the TaskManager off-heap memory (39017161219 bytes). Try
> increasing the maximum direct memory (-XX:MaxDirectMemorySize)
>
> However If I add an obsolute taskmanager.memory.size:
> taskmanager.memory.size: 15360
> the cluster starts successfully.
>
> My understanding is that if taskmanager.memory.size is unspecified then it
> should be equal to 0.7 * taskmanager.heap.mb. So I don't understand why it
> throws an exception and it works if its larger than that fraction.
>
> Any help is appreciated.
>
> Best,
> Yassine


signature.asc (201 bytes) Download Attachment