(DEPRECATED) Apache Flink User Mailing List archive.

OutOfMemoryError

Classic

List

Threaded

3 messages Options

Paulo Cezar

OutOfMemoryError

Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar

Stephan Ewen

Re: OutOfMemoryError

My guess would be that you have a thread leak in the user code.

More memory will not solve the problem, only push it a bit further away.

On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <[hidden email]> wrote:

Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar

Paulo Cezar

Re: OutOfMemoryError

Thanks Stephan, I had a MapFunction using Unirest and that was the origin of the leak.

On Tue, Aug 2, 2016 at 7:36 AM, Stephan Ewen <[hidden email]> wrote:

My guess would be that you have a thread leak in the user code.

More memory will not solve the problem, only push it a bit further away.

On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <[hidden email]> wrote:

Hi folks,

I'm trying to run a DataSet program but after around 200k records are processed a "java.lang.OutOfMemoryError: unable to create new native thread" stops me.

I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 nodes (each with 8 cores) and starting 10 task managers, each with 8 slots and 6GB of RAM.

Except for the data sink that writes to HDFS and runs with a parallelism of 1, my job runs with a parallelism of 80 and has two input datasets, each is a HDFS file with around 6GB and 20mi lines. Most of my map functions uses external services via RPC or REST APIs to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar