Use jvm to run flink on single-node machine with many cores

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Use jvm to run flink on single-node machine with many cores

Ana M. Martinez
Hi all,

I am trying to run a program using the flink java library with ExecutionEnvironment.getExecutionEnvironment() from the command line using java -jar.

If I run the code in my machine (with four cores) or in a multi-node cluster (using yarn) the program runs normally, but if I want to run it on a machine with a single node and 32 cores using java -jar I get the following error:

02/21/2016 13:33:09 MapPartition (MapPartition at toBatches(ConversionToBatches.java:55))(29/32) switched to FAILED 
java.io.IOException: Insufficient number of network buffers: required 1, but only 0 available. The total number of network buffers is currently set to 2048. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
at java.lang.Thread.run(Thread.java:745)

In this case (java -jar), I don’t know if or how I can increase the number of network buffers. Is there a way to do it without having to use yarn (as I don’t have hadoop installed)?

Thanks,
Ana

Reply | Threaded
Open this post in threaded view
|

Re: Use jvm to run flink on single-node machine with many cores

rmetzger0
Hi Ana,

you can create a StreamExecutionEnvironment also by passing a configuration object. In the configuration, you can also configure the number of network buffers.


// set up the execution environment
Configuration conf = new Configuration();
conf.setBoolean("taskmanager.network.numberOfBuffers", "16000");
final StreamExecutionEnvironment env = LocalStreamEnvironment.createLocalEnvironment(8, conf);

On Sun, Feb 21, 2016 at 1:34 PM, Ana M. Martinez <[hidden email]> wrote:
Hi all,

I am trying to run a program using the flink java library with ExecutionEnvironment.getExecutionEnvironment() from the command line using java -jar.

If I run the code in my machine (with four cores) or in a multi-node cluster (using yarn) the program runs normally, but if I want to run it on a machine with a single node and 32 cores using java -jar I get the following error:

02/21/2016 13:33:09 MapPartition (MapPartition at toBatches(ConversionToBatches.java:55))(29/32) switched to FAILED 
java.io.IOException: Insufficient number of network buffers: required 1, but only 0 available. The total number of network buffers is currently set to 2048. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
at java.lang.Thread.run(Thread.java:745)

In this case (java -jar), I don’t know if or how I can increase the number of network buffers. Is there a way to do it without having to use yarn (as I don’t have hadoop installed)?

Thanks,
Ana


Reply | Threaded
Open this post in threaded view
|

Re: Use jvm to run flink on single-node machine with many cores

Márton Balassi
In reply to this post by Ana M. Martinez
Dear Ana,

If you are using a single machine with multiple cores, but need convenient access to the configuration I would personally recommend using the local cluster option in the flink distribution. [1] If you want to avoid having a flink distro on the machine, then Robert's solution is the way to go.


On Sun, Feb 21, 2016 at 1:34 PM, Ana M. Martinez <[hidden email]> wrote:
Hi all,

I am trying to run a program using the flink java library with ExecutionEnvironment.getExecutionEnvironment() from the command line using java -jar.

If I run the code in my machine (with four cores) or in a multi-node cluster (using yarn) the program runs normally, but if I want to run it on a machine with a single node and 32 cores using java -jar I get the following error:

02/21/2016 13:33:09 MapPartition (MapPartition at toBatches(ConversionToBatches.java:55))(29/32) switched to FAILED 
java.io.IOException: Insufficient number of network buffers: required 1, but only 0 available. The total number of network buffers is currently set to 2048. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers'.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
at java.lang.Thread.run(Thread.java:745)

In this case (java -jar), I don’t know if or how I can increase the number of network buffers. Is there a way to do it without having to use yarn (as I don’t have hadoop installed)?

Thanks,
Ana


Reply | Threaded
Open this post in threaded view
|

Re: Use jvm to run flink on single-node machine with many cores

Ufuk Celebi
Note that the method to call in the example should be
`conf.setInteger` and the second argument not a String but an int.

On Sun, Feb 21, 2016 at 1:41 PM, Márton Balassi
<[hidden email]> wrote:

> Dear Ana,
>
> If you are using a single machine with multiple cores, but need convenient
> access to the configuration I would personally recommend using the local
> cluster option in the flink distribution. [1] If you want to avoid having a
> flink distro on the machine, then Robert's solution is the way to go.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/setup/local_setup.html
>
> On Sun, Feb 21, 2016 at 1:34 PM, Ana M. Martinez <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I am trying to run a program using the flink java library with
>> ExecutionEnvironment.getExecutionEnvironment() from the command line using
>> java -jar.
>>
>> If I run the code in my machine (with four cores) or in a multi-node
>> cluster (using yarn) the program runs normally, but if I want to run it on a
>> machine with a single node and 32 cores using java -jar I get the following
>> error:
>>
>> 02/21/2016 13:33:09 MapPartition (MapPartition at
>> toBatches(ConversionToBatches.java:55))(29/32) switched to FAILED
>> java.io.IOException: Insufficient number of network buffers: required 1,
>> but only 0 available. The total number of network buffers is currently set
>> to 2048. You can increase this number by setting the configuration key
>> 'taskmanager.network.numberOfBuffers'.
>> at
>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
>> at
>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> In this case (java -jar), I don’t know if or how I can increase the number
>> of network buffers. Is there a way to do it without having to use yarn (as I
>> don’t have hadoop installed)?
>>
>> Thanks,
>> Ana
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Use jvm to run flink on single-node machine with many cores

Ana M. Martinez
Hi all,

Thank you very much for your help. It worked perfectly like this:

Configuration conf = new Configuration();
conf.setInteger(
"taskmanager.network.numberOfBuffers", 16000);
conf.setInteger(
"taskmanager.numberOfTaskSlots,32);
final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(conf);

env.setParallelism(32
);
I believe that setting taskmanager.numberOfTaskSlots is not necessary, but setParallelism is, as by default 1 was taken.

Best regards,
Ana

On 22 Feb 2016, at 10:37, Ufuk Celebi <[hidden email]> wrote:

Note that the method to call in the example should be
`conf.setInteger` and the second argument not a String but an int.

On Sun, Feb 21, 2016 at 1:41 PM, Márton Balassi
<[hidden email]> wrote:
Dear Ana,

If you are using a single machine with multiple cores, but need convenient
access to the configuration I would personally recommend using the local
cluster option in the flink distribution. [1] If you want to avoid having a
flink distro on the machine, then Robert's solution is the way to go.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/setup/local_setup.html

On Sun, Feb 21, 2016 at 1:34 PM, Ana M. Martinez <[hidden email]> wrote:

Hi all,

I am trying to run a program using the flink java library with
ExecutionEnvironment.getExecutionEnvironment() from the command line using
java -jar.

If I run the code in my machine (with four cores) or in a multi-node
cluster (using yarn) the program runs normally, but if I want to run it on a
machine with a single node and 32 cores using java -jar I get the following
error:

02/21/2016 13:33:09 MapPartition (MapPartition at
toBatches(ConversionToBatches.java:55))(29/32) switched to FAILED
java.io.IOException: Insufficient number of network buffers: required 1,
but only 0 available. The total number of network buffers is currently set
to 2048. You can increase this number by setting the configuration key
'taskmanager.network.numberOfBuffers'.
at
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:196)
at
org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:325)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:488)
at java.lang.Thread.run(Thread.java:745)

In this case (java -jar), I don’t know if or how I can increase the number
of network buffers. Is there a way to do it without having to use yarn (as I
don’t have hadoop installed)?

Thanks,
Ana



Reply | Threaded
Open this post in threaded view
|

Re: Use jvm to run flink on single-node machine with many cores

Ufuk Celebi
On Tue, Feb 23, 2016 at 10:17 AM, Ana M. Martinez <[hidden email]> wrote:
> I believe that setting taskmanager.numberOfTaskSlots is not necessary, but
> setParallelism is, as by default 1 was taken.

Yes, the number of slots in local execution defaults to the maximum
parallelism of the job.