Login  Register

Netty LocalTransportException: Sending the partition request to 'null' failed

classic Classic list List threaded Threaded
7 messages Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Netty LocalTransportException: Sending the partition request to 'null' failed

Matthias Seiler
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

Till Rohrmann
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

Matthias Seiler

Hi Till,

thanks for the hint, you seem about right. Setting the log level to DEBUG reveals more information, but I don't know what to do about it.

All logs throw some Java related exceptions:
`java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled`
and
`java.lang.IllegalAccessException: class org.apache.flink.shaded.netty4.io.netty.util.internal.PlatformDependent0$6 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module`

The log of node-2's TaskManager reveals connection problems:
`org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'node-2/127.0.1.1': Invalid argument (connect failed)`
`java.net.ConnectException: Invalid argument (connect failed)`

What's more, both TaskManagers (node-1 and node-2) are having trouble to load `org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64`, but load some version eventually.


There is quite a lot going on here that I don't understand. Can you (or someone) shed some light on it and tell me what I could try?

Some more information:
I appended the following to the `/etc/hosts` file:
```
<ip-node-1> node-1
<ip-node-2> node-2
```
And the `flink/conf/workers` consists of:
```
node-1
node-2
```

Thanks,
Matthias

P.S. I attached the logs for further reference. `<ip-node-1>` is of course the real IP address instead.


On 2/16/21 1:56 PM, Till Rohrmann wrote:
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias


taskmanager_node-1.log (100K) Download Attachment
taskmanager_node-2.log (107K) Download Attachment
jobmanager.log (238K) Download Attachment
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

Arvid Heise-4
Hi Matthias,

most of the debug statements are just noise. You can ignore that.

Something with your network seems fishy to me. Either taskmanager 1 cannot connect to taskmanager 2 (and vice versa), or the taskmanager cannot connect locally.

I found this fragment, which seems suspicious

Failed to connect to /127.0.1.1:32797. Giving up.

localhost is usually 127.0.0.1. Can you double check that you connect from all machines to all machines (including themselves) by opening trivial text sockets on random ports?

On Fri, Feb 19, 2021 at 10:59 AM Matthias Seiler <[hidden email]> wrote:

Hi Till,

thanks for the hint, you seem about right. Setting the log level to DEBUG reveals more information, but I don't know what to do about it.

All logs throw some Java related exceptions:
`java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled`
and
`java.lang.IllegalAccessException: class org.apache.flink.shaded.netty4.io.netty.util.internal.PlatformDependent0$6 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module`

The log of node-2's TaskManager reveals connection problems:
`org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'node-2/127.0.1.1': Invalid argument (connect failed)`
`java.net.ConnectException: Invalid argument (connect failed)`

What's more, both TaskManagers (node-1 and node-2) are having trouble to load `org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64`, but load some version eventually.


There is quite a lot going on here that I don't understand. Can you (or someone) shed some light on it and tell me what I could try?

Some more information:
I appended the following to the `/etc/hosts` file:
```
<ip-node-1> node-1
<ip-node-2> node-2
```
And the `flink/conf/workers` consists of:
```
node-1
node-2
```

Thanks,
Matthias

P.S. I attached the logs for further reference. `<ip-node-1>` is of course the real IP address instead.


On 2/16/21 1:56 PM, Till Rohrmann wrote:
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

Matthias Seiler

Hi Arvid,

I listened to ports with netcat and connected via telnet and each node can connect to the other and itself.

The `/etc/hosts` file looks like this
```
127.0.0.1   localhost
127.0.1.1   node-2.example.com   node-2

<ip-node-1>   node-1
```
Is the second line the reason it fails? I also replaced all hostnames with IP addresses in the config files (flink-conf, workers, masters) but without effect...

Do you have any ideas what else I could try?

Thanks again,
Matthias

On 2/24/21 2:17 PM, Arvid Heise wrote:
Hi Matthias,

most of the debug statements are just noise. You can ignore that.

Something with your network seems fishy to me. Either taskmanager 1 cannot connect to taskmanager 2 (and vice versa), or the taskmanager cannot connect locally.

I found this fragment, which seems suspicious

Failed to connect to /127.0.1.1:32797. Giving up.

localhost is usually 127.0.0.1. Can you double check that you connect from all machines to all machines (including themselves) by opening trivial text sockets on random ports?

On Fri, Feb 19, 2021 at 10:59 AM Matthias Seiler <[hidden email]> wrote:

Hi Till,

thanks for the hint, you seem about right. Setting the log level to DEBUG reveals more information, but I don't know what to do about it.

All logs throw some Java related exceptions:
`java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled`
and
`java.lang.IllegalAccessException: class org.apache.flink.shaded.netty4.io.netty.util.internal.PlatformDependent0$6 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module`

The log of node-2's TaskManager reveals connection problems:
`org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'node-2/127.0.1.1': Invalid argument (connect failed)`
`java.net.ConnectException: Invalid argument (connect failed)`

What's more, both TaskManagers (node-1 and node-2) are having trouble to load `org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64`, but load some version eventually.


There is quite a lot going on here that I don't understand. Can you (or someone) shed some light on it and tell me what I could try?

Some more information:
I appended the following to the `/etc/hosts` file:
```
<ip-node-1> node-1
<ip-node-2> node-2
```
And the `flink/conf/workers` consists of:
```
node-1
node-2
```

Thanks,
Matthias

P.S. I attached the logs for further reference. `<ip-node-1>` is of course the real IP address instead.


On 2/16/21 1:56 PM, Till Rohrmann wrote:
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

rmetzger0
Hey Matthias,

are you sure you can connect to 127.0.1.1, since everything between 127.0.0.1 and  127.255.255.255 is bound to the loopback device?: https://serverfault.com/a/363098



On Mon, Mar 15, 2021 at 11:13 AM Matthias Seiler <[hidden email]> wrote:

Hi Arvid,

I listened to ports with netcat and connected via telnet and each node can connect to the other and itself.

The `/etc/hosts` file looks like this
```
127.0.0.1   localhost
127.0.1.1   node-2.example.com   node-2

<ip-node-1>   node-1
```
Is the second line the reason it fails? I also replaced all hostnames with IP addresses in the config files (flink-conf, workers, masters) but without effect...

Do you have any ideas what else I could try?

Thanks again,
Matthias

On 2/24/21 2:17 PM, Arvid Heise wrote:
Hi Matthias,

most of the debug statements are just noise. You can ignore that.

Something with your network seems fishy to me. Either taskmanager 1 cannot connect to taskmanager 2 (and vice versa), or the taskmanager cannot connect locally.

I found this fragment, which seems suspicious

Failed to connect to /127.0.1.1:32797. Giving up.

localhost is usually 127.0.0.1. Can you double check that you connect from all machines to all machines (including themselves) by opening trivial text sockets on random ports?

On Fri, Feb 19, 2021 at 10:59 AM Matthias Seiler <[hidden email]> wrote:

Hi Till,

thanks for the hint, you seem about right. Setting the log level to DEBUG reveals more information, but I don't know what to do about it.

All logs throw some Java related exceptions:
`java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled`
and
`java.lang.IllegalAccessException: class org.apache.flink.shaded.netty4.io.netty.util.internal.PlatformDependent0$6 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module`

The log of node-2's TaskManager reveals connection problems:
`org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'node-2/127.0.1.1': Invalid argument (connect failed)`
`java.net.ConnectException: Invalid argument (connect failed)`

What's more, both TaskManagers (node-1 and node-2) are having trouble to load `org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64`, but load some version eventually.


There is quite a lot going on here that I don't understand. Can you (or someone) shed some light on it and tell me what I could try?

Some more information:
I appended the following to the `/etc/hosts` file:
```
<ip-node-1> node-1
<ip-node-2> node-2
```
And the `flink/conf/workers` consists of:
```
node-1
node-2
```

Thanks,
Matthias

P.S. I attached the logs for further reference. `<ip-node-1>` is of course the real IP address instead.


On 2/16/21 1:56 PM, Till Rohrmann wrote:
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias

Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

Matthias Seiler

Thanks a bunch! I replaced 127.0.1.1 with the actual IP address and it works now :)

On 3/15/21 3:22 PM, Robert Metzger wrote:
Hey Matthias,

are you sure you can connect to 127.0.1.1, since everything between 127.0.0.1 and  127.255.255.255 is bound to the loopback device?: https://serverfault.com/a/363098



On Mon, Mar 15, 2021 at 11:13 AM Matthias Seiler <[hidden email]> wrote:

Hi Arvid,

I listened to ports with netcat and connected via telnet and each node can connect to the other and itself.

The `/etc/hosts` file looks like this
```
127.0.0.1   localhost
127.0.1.1   node-2.example.com   node-2

<ip-node-1>   node-1
```
Is the second line the reason it fails? I also replaced all hostnames with IP addresses in the config files (flink-conf, workers, masters) but without effect...

Do you have any ideas what else I could try?

Thanks again,
Matthias

On 2/24/21 2:17 PM, Arvid Heise wrote:
Hi Matthias,

most of the debug statements are just noise. You can ignore that.

Something with your network seems fishy to me. Either taskmanager 1 cannot connect to taskmanager 2 (and vice versa), or the taskmanager cannot connect locally.

I found this fragment, which seems suspicious

Failed to connect to /127.0.1.1:32797. Giving up.

localhost is usually 127.0.0.1. Can you double check that you connect from all machines to all machines (including themselves) by opening trivial text sockets on random ports?

On Fri, Feb 19, 2021 at 10:59 AM Matthias Seiler <[hidden email]> wrote:

Hi Till,

thanks for the hint, you seem about right. Setting the log level to DEBUG reveals more information, but I don't know what to do about it.

All logs throw some Java related exceptions:
`java.lang.UnsupportedOperationException: Reflective setAccessible(true) disabled`
and
`java.lang.IllegalAccessException: class org.apache.flink.shaded.netty4.io.netty.util.internal.PlatformDependent0$6 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module`

The log of node-2's TaskManager reveals connection problems:
`org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address 'node-2/127.0.1.1': Invalid argument (connect failed)`
`java.net.ConnectException: Invalid argument (connect failed)`

What's more, both TaskManagers (node-1 and node-2) are having trouble to load `org_apache_flink_shaded_netty4_netty_transport_native_epoll_x86_64`, but load some version eventually.


There is quite a lot going on here that I don't understand. Can you (or someone) shed some light on it and tell me what I could try?

Some more information:
I appended the following to the `/etc/hosts` file:
```
<ip-node-1> node-1
<ip-node-2> node-2
```
And the `flink/conf/workers` consists of:
```
node-1
node-2
```

Thanks,
Matthias

P.S. I attached the logs for further reference. `<ip-node-1>` is of course the real IP address instead.


On 2/16/21 1:56 PM, Till Rohrmann wrote:
Hi Matthias,

Can you make sure that node-1 and node-2 can talk to each other? It looks to me that node-2 fails to open a connection to the other TaskManager. Maybe the logs give some more insights. You can change the log level to DEBUG to gather more information.

Cheers,
Till

On Tue, Feb 16, 2021 at 10:57 AM Matthias Seiler <[hidden email]> wrote:
Hi Everyone,

I'm trying to setup a Flink cluster in standealone mode with two
machines. However, running a job throws the following exception:
`org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed`

Here is some background:

Machines:
- node-1: JobManager, TaskManager
- node-2: TaskManager

flink-conf-yaml looks like this:
```
jobmanager.rpc.address: node-1
taskmanager.numberOfTaskSlots: 8
parallelism.default: 2
cluster.evenly-spread-out-slots: true
```

Deploying the cluster works: I can see both TaskManagers in the WebUI.

I ran the streaming WordCount example: `flink run
flink-1.12.1/examples/streaming/WordCount.jar --input lorem-ipsum.txt`
- the job has been submitted
- job failed (with the above exception)
- the log of the node-2 also shows the exception, the other logs are
fine (graceful stop)

I played around with the config and observed that
- if parallelism is set to 1, node-1 gets all the slots and node-2 none
- if parallelism is set to 2, each TaskManager occupies 1 TaskSlot (but
fails because of node-2)

I suspect, that the problem must be with the communication between
TaskManagers
- job runs successful if
    - node-1 is the **only** node with x TaskManagers (tested with x=1
and x=2)
    - node-2 is the **only** node with x TaskManagers (tested with x=1
and x=2)
- job fails if
    - node-1 **and** node-2 have one TaskManager

The full exception is:
```
org.apache.flink.client.program.ProgramInvocationException: The main
method caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID: 547d4d29b3650883aa403cdb5eb1ba5c)
// ... Job failed, Recovery is suppressed by
NoRestartBackoffTimeStrategy, ...
Caused by:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
Sending the partition request to 'null' failed.
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:137)
    at
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:125)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
    at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
    ... 11 more
```

Thanks in advance,
Matthias