OutOfMemoryError

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

OutOfMemoryError

Paulo Cezar
Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemoryError

Stephan Ewen
My guess would be that you have a thread leak in the user code.
More memory will not solve the problem, only push it a bit further away.

On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <[hidden email]> wrote:
Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemoryError

Paulo Cezar
Thanks Stephan, I had a MapFunction using Unirest and that was the origin of the leak.

On Tue, Aug 2, 2016 at 7:36 AM, Stephan Ewen <[hidden email]> wrote:
My guess would be that you have a thread leak in the user code.
More memory will not solve the problem, only push it a bit further away.

On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <[hidden email]> wrote:
Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar