Hi to all, we wrote a small JUnit test to reproduce a memory issue we have in a Flink job (that seems related to Netty) . At some point, usually around the 28th loop, the job fails with the following exception (actually we never faced that in production but maybe is related to the memory issue somehow...): Caused by: java.lang.IllegalAccessError: org/apache/flink/runtime/io/network/netty/NettyMessage at io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java) at io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102) ... 16 more The github project is https://github.com/okkam-it/flink-memory-leak and the JUnit test is contained in the MemoryLeakTest class (within src/main/test). Thanks in advance for any support, Flavio
|
I can confirm that the issue is
reproducible with the given test, from the command-line and IDE.
While cutting down the test case, by replacing the outputformat with a DiscardingOutputFormat and the JDBCInputFormat with a simple collection, i stumbled onto a new Exception after ~200 iterations: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. Caused by: java.io.IOException: Insufficient number of network buffers: required 4, but only 1 available. The total number of network buffers is currently set to 5691 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.network.memory.fraction', 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'. at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:195) at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:186) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602) at java.lang.Thread.run(Thread.java:745) On 11.10.2017 12:48, Flavio Pompermaier wrote:
|
@Chesnay: Recycling of network resources happens after the tasks go
into state FINISHED. Since we are submitting new jobs in a local loop here it can easily happen that the new job is submitted before enough buffers are available again. At least, previously that was the case. I'm CC'ing Nico who refactored the network buffer distribution recently and who might have more details about this specific error message. @Nico: another question is why there seem to be more buffers available but we don't assign them. I'm referring to this part of the error message "5691 of 32768 bytes...". On Wed, Oct 11, 2017 at 2:54 PM, Chesnay Schepler <[hidden email]> wrote: > I can confirm that the issue is reproducible with the given test, from the > command-line and IDE. > > While cutting down the test case, by replacing the outputformat with a > DiscardingOutputFormat and the JDBCInputFormat with a simple collection, i > stumbled onto a new Exception after ~200 iterations: > > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > > Caused by: java.io.IOException: Insufficient number of network buffers: > required 4, but only 1 available. The total number of network buffers is > currently set to 5691 of 32768 bytes each. You can increase this number by > setting the configuration keys 'taskmanager.network.memory.fraction', > 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:195) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:186) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602) > at java.lang.Thread.run(Thread.java:745) > > > On 11.10.2017 12:48, Flavio Pompermaier wrote: > > Hi to all, > we wrote a small JUnit test to reproduce a memory issue we have in a Flink > job (that seems related to Netty) . At some point, usually around the 28th > loop, the job fails with the following exception (actually we never faced > that in production but maybe is related to the memory issue somehow...): > > Caused by: java.lang.IllegalAccessError: > org/apache/flink/runtime/io/network/netty/NettyMessage > at > io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java) > at > io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102) > ... 16 more > > The github project is https://github.com/okkam-it/flink-memory-leak and the > JUnit test is contained in the MemoryLeakTest class (within src/main/test). > > Thanks in advance for any support, > Flavio > > |
Any update on this? Do you want me to create a JIRA issue for this bug? On 11 Oct 2017 17:14, "Ufuk Celebi" <[hidden email]> wrote: @Chesnay: Recycling of network resources happens after the tasks go |
Free forum by Nabble | Edit this page |