flink:latest container on kubernetes fails to connect taskmanager to jobmanager

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

flink:latest container on kubernetes fails to connect taskmanager to jobmanager

jwatte
I'm using the standard Kubernetes deploy configs for jobmanager and
taskmanager deployments, and jobmanager service.
However, when the task managers start up, they try to register with the job
manager over Akka on port 6123.
This fails, because the Akka on the jobmanager discards those messages as
"non-local."

The taskmanager keeps repeating this log message and eventually existing
(and getting restarted by Kubernetes):

2018-10-01 20:08:28,365 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in
10000 ms: Ask timed out on
[ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:6123/),
Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of
type "akka.actor.Identify"..

The jobmanager responds with this log message:

2018-10-01 20:09:38,475 ERROR akka.remote.EndpointWriter                                  
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@flink-jobmanager:6123/]] arriving at
[akka.tcp://flink@flink-jobmanager:6123] inbound addresses are
[akka.tcp://flink@cluster:6123]

I have verified that network connectivity exists, so this is some
configuration problem.
I notice that the docker-entrypoint.sh edits the config files and calls the
taskmanager.sh / jobmanager.sh scripts based on start mode.
Is this file editing the config file wrong? What needs to be done so that
Akka on the jobmanager accepts the registration messages?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: flink:latest container on kubernetes fails to connect taskmanager to jobmanager

jwatte
It turns out that the latest flink:latest docker image is 5 days old, and
thus bug was fixed 4 days ago in the flink-docker github.

The problem is that the docker-entrypoint.sh script chains to jobmanager.sh
by saying "start-foreground cluster" where the "cluster" argument is
obsolete as of Flink 1.5.

I patched it with a sed command in the Kubernetes manifest, until the
updated docker image makes it way to the world.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: flink:latest container on kubernetes fails to connect taskmanager to jobmanager

vino yang
Hi jwatte,

Maybe Till can help you.

Thanks, vino.

jwatte <[hidden email]> 于2018年10月2日周二 上午5:30写道:
It turns out that the latest flink:latest docker image is 5 days old, and
thus bug was fixed 4 days ago in the flink-docker github.

The problem is that the docker-entrypoint.sh script chains to jobmanager.sh
by saying "start-foreground cluster" where the "cluster" argument is
obsolete as of Flink 1.5.

I patched it with a sed command in the Kubernetes manifest, until the
updated docker image makes it way to the world.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: flink:latest container on kubernetes fails to connect taskmanager to jobmanager

Till Rohrmann
Hi jwatte,

sorry for the inconveniences. I hope that the dicker hub images have been updated by now.

Cheers,
Till

On Wed, Oct 10, 2018, 05:20 vino yang <[hidden email]> wrote:
Hi jwatte,

Maybe Till can help you.

Thanks, vino.

jwatte <[hidden email]> 于2018年10月2日周二 上午5:30写道:
It turns out that the latest flink:latest docker image is 5 days old, and
thus bug was fixed 4 days ago in the flink-docker github.

The problem is that the docker-entrypoint.sh script chains to jobmanager.sh
by saying "start-foreground cluster" where the "cluster" argument is
obsolete as of Flink 1.5.

I patched it with a sed command in the Kubernetes manifest, until the
updated docker image makes it way to the world.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/