docker based taskmanager can't connect to job/resource manager

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

docker based taskmanager can't connect to job/resource manager

Günter Hipler-2
Hi,

I'm trying to start a mini cluster following the explanations given in a
flink forward presentation [1]

Starting a jobmanager task is possible

FLINK_PROPERTIES="jobmanager.memory.process.size: 2048m
parallelism.default: 4
"
docker network create flink-network

docker run  \
--rm   \
--name=jobmanager  \
--network flink-network \
-p 8081:8081  \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 jobmanager


Unfortunately the taskmanager process can't connect

docker run  \
--rm   \
--name=taskmanager  \
--network flink-network \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 taskmanager

2021-05-12 19:43:11,396 INFO
org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed
to connect from address '/172.20.0.3': Connection refused (Connection
refused)
2021-05-12 19:44:26,082 WARN
akka.remote.transport.netty.NettyTransport                   [] - Remote
connection to [null] failed with java.net.ConnectException: Connection
refused: 5e8efb79f191/172.20.0.3:6123
2021-05-12 19:44:26,084 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
not resolve ResourceManager address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*, retrying
in 10000 ms: Could not connect to rpc endpoint under address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*.
2021-05-12 19:44:26,084 WARN
akka.remote.ReliableDeliverySupervisor                       [] -
Association with remote system [akka.tcp://flink@5e8efb79f191:6123] has
failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink@5e8efb79f191:6123]] Caused by:
[java.net.ConnectException: Connection refused:
5e8efb79f191/172.20.0.3:6123]

and the dashboard (of the jobmanager task) doesn't show the taskmanager
process (as I would expect)

Any hints? - Thanks!

Günter


[1]
https://www.youtube.com/watch?v=VVh6ikd-l9s&list=PLDX4T_cnKjD054YExbUOkr_xdYknVPQUm&index=45
"Flink's New Dockerfile: One File to Rule Them All"

Reply | Threaded
Open this post in threaded view
|

Re: docker based taskmanager can't connect to job/resource manager

Guowei Ma
Hi,
I do not try it. But from the documentation[1] it seems that you might need add the "jobmanager.rpc.address: jobmanager" to the FLINK_PROPERTIES before creating a network.


On Thu, May 13, 2021 at 3:56 AM guenterh.lists <[hidden email]> wrote:
Hi,

I'm trying to start a mini cluster following the explanations given in a
flink forward presentation [1]

Starting a jobmanager task is possible

FLINK_PROPERTIES="jobmanager.memory.process.size: 2048m
parallelism.default: 4
"
docker network create flink-network

docker run  \
--rm   \
--name=jobmanager  \
--network flink-network \
-p 8081:8081  \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 jobmanager


Unfortunately the taskmanager process can't connect

docker run  \
--rm   \
--name=taskmanager  \
--network flink-network \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 taskmanager

2021-05-12 19:43:11,396 INFO
org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed
to connect from address '/172.20.0.3': Connection refused (Connection
refused)
2021-05-12 19:44:26,082 WARN
akka.remote.transport.netty.NettyTransport                   [] - Remote
connection to [null] failed with java.net.ConnectException: Connection
refused: 5e8efb79f191/172.20.0.3:6123
2021-05-12 19:44:26,084 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
not resolve ResourceManager address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*, retrying
in 10000 ms: Could not connect to rpc endpoint under address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*.
2021-05-12 19:44:26,084 WARN
akka.remote.ReliableDeliverySupervisor                       [] -
Association with remote system [akka.tcp://flink@5e8efb79f191:6123] has
failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink@5e8efb79f191:6123]] Caused by:
[java.net.ConnectException: Connection refused:
5e8efb79f191/172.20.0.3:6123]

and the dashboard (of the jobmanager task) doesn't show the taskmanager
process (as I would expect)

Any hints? - Thanks!

Günter


[1]
https://www.youtube.com/watch?v=VVh6ikd-l9s&list=PLDX4T_cnKjD054YExbUOkr_xdYknVPQUm&index=45
"Flink's New Dockerfile: One File to Rule Them All"

Reply | Threaded
Open this post in threaded view
|

Re: docker based taskmanager can't connect to job/resource manager

Günter Hipler-2

Hi Guowei,

thanks for your reply! This information was still missing. The presenter mentioned the documentation but I hadn't found it. So your link to the specific place is valuable too.

Günter

On 13.05.21 06:09, Guowei Ma wrote:
Hi,
I do not try it. But from the documentation[1] it seems that you might need add the "jobmanager.rpc.address: jobmanager" to the FLINK_PROPERTIES before creating a network.


On Thu, May 13, 2021 at 3:56 AM guenterh.lists <[hidden email]> wrote:
Hi,

I'm trying to start a mini cluster following the explanations given in a
flink forward presentation [1]

Starting a jobmanager task is possible

FLINK_PROPERTIES="jobmanager.memory.process.size: 2048m
parallelism.default: 4
"
docker network create flink-network

docker run  \
--rm   \
--name=jobmanager  \
--network flink-network \
-p 8081:8081  \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 jobmanager


Unfortunately the taskmanager process can't connect

docker run  \
--rm   \
--name=taskmanager  \
--network flink-network \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 taskmanager

2021-05-12 19:43:11,396 INFO
org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed
to connect from address '/172.20.0.3': Connection refused (Connection
refused)
2021-05-12 19:44:26,082 WARN
akka.remote.transport.netty.NettyTransport                   [] - Remote
connection to [null] failed with java.net.ConnectException: Connection
refused: 5e8efb79f191/172.20.0.3:6123
2021-05-12 19:44:26,084 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
not resolve ResourceManager address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*, retrying
in 10000 ms: Could not connect to rpc endpoint under address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*.
2021-05-12 19:44:26,084 WARN
akka.remote.ReliableDeliverySupervisor                       [] -
Association with remote system [akka.tcp://flink@5e8efb79f191:6123] has
failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink@5e8efb79f191:6123]] Caused by:
[java.net.ConnectException: Connection refused:
5e8efb79f191/172.20.0.3:6123]

and the dashboard (of the jobmanager task) doesn't show the taskmanager
process (as I would expect)

Any hints? - Thanks!

Günter


[1]
https://www.youtube.com/watch?v=VVh6ikd-l9s&list=PLDX4T_cnKjD054YExbUOkr_xdYknVPQUm&index=45
"Flink's New Dockerfile: One File to Rule Them All"

Reply | Threaded
Open this post in threaded view
|

Re: docker based taskmanager can't connect to job/resource manager

Guowei Ma

On Thu, May 13, 2021 at 1:53 PM guenterh.lists <[hidden email]> wrote:

Hi Guowei,

thanks for your reply! This information was still missing. The presenter mentioned the documentation but I hadn't found it. So your link to the specific place is valuable too.

Günter

On 13.05.21 06:09, Guowei Ma wrote:
Hi,
I do not try it. But from the documentation[1] it seems that you might need add the "jobmanager.rpc.address: jobmanager" to the FLINK_PROPERTIES before creating a network.


On Thu, May 13, 2021 at 3:56 AM guenterh.lists <[hidden email]> wrote:
Hi,

I'm trying to start a mini cluster following the explanations given in a
flink forward presentation [1]

Starting a jobmanager task is possible

FLINK_PROPERTIES="jobmanager.memory.process.size: 2048m
parallelism.default: 4
"
docker network create flink-network

docker run  \
--rm   \
--name=jobmanager  \
--network flink-network \
-p 8081:8081  \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 jobmanager


Unfortunately the taskmanager process can't connect

docker run  \
--rm   \
--name=taskmanager  \
--network flink-network \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}"  \
flink:1.13.0-scala_2.12-java11 taskmanager

2021-05-12 19:43:11,396 INFO
org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed
to connect from address '/172.20.0.3': Connection refused (Connection
refused)
2021-05-12 19:44:26,082 WARN
akka.remote.transport.netty.NettyTransport                   [] - Remote
connection to [null] failed with java.net.ConnectException: Connection
refused: 5e8efb79f191/172.20.0.3:6123
2021-05-12 19:44:26,084 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
not resolve ResourceManager address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*, retrying
in 10000 ms: Could not connect to rpc endpoint under address
akka.tcp://flink@5e8efb79f191:6123/user/rpc/resourcemanager_*.
2021-05-12 19:44:26,084 WARN
akka.remote.ReliableDeliverySupervisor                       [] -
Association with remote system [akka.tcp://flink@5e8efb79f191:6123] has
failed, address is now gated for [50] ms. Reason: [Association failed
with [akka.tcp://flink@5e8efb79f191:6123]] Caused by:
[java.net.ConnectException: Connection refused:
5e8efb79f191/172.20.0.3:6123]

and the dashboard (of the jobmanager task) doesn't show the taskmanager
process (as I would expect)

Any hints? - Thanks!

Günter


[1]
https://www.youtube.com/watch?v=VVh6ikd-l9s&list=PLDX4T_cnKjD054YExbUOkr_xdYknVPQUm&index=45
"Flink's New Dockerfile: One File to Rule Them All"