Beam Word Cound on Flink/kubernetes Issue with task manager ot getting picked up

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Beam Word Cound on Flink/kubernetes Issue with task manager ot getting picked up

Avijit Saha
Hi,

I have a docker image of the Beam WordCount example that reads a status file and produces a output one time with word counts etc.

This runs fine as a separate job-manager and task-manager when run from docker-compose locally.

Now, I am trying to deploy and run this on my Kubernetes cluster as per instructions at  https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes.

Deployment of Job-cluster and task-manager goes thro fine but the task-manager never seems to be picked up - it always stays in 'Pending' status!
Is this expected behavior for a one-time Job like Word-Count application or am I missing something?

Thanks
Avijit

$ kubectl get pods
NAME                                  READY               STATUS    RESTARTS   AGE
flink-job-cluster-kw85v               2/2                   Running   2         15m
flink-task-manager-5cc79c5795-7mnqh   0/2     Pending   0          14m
Reply | Threaded
Open this post in threaded view
|

Re: Beam Word Cound on Flink/kubernetes Issue with task manager ot getting picked up

Yang Wang
Hi Avijit Saha,

I think you could use 'kubectl describe pod flink-task-manager-5cc79c5795-7mnqh' to get more information.
Usually, it is caused by no enough resource in your K8s cluster.


Best,
Yang

Avijit Saha <[hidden email]> 于2020年7月14日周二 上午7:12写道:
Hi,

I have a docker image of the Beam WordCount example that reads a status file and produces a output one time with word counts etc.

This runs fine as a separate job-manager and task-manager when run from docker-compose locally.

Now, I am trying to deploy and run this on my Kubernetes cluster as per instructions at  https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes.

Deployment of Job-cluster and task-manager goes thro fine but the task-manager never seems to be picked up - it always stays in 'Pending' status!
Is this expected behavior for a one-time Job like Word-Count application or am I missing something?

Thanks
Avijit

$ kubectl get pods
NAME                                  READY               STATUS    RESTARTS   AGE
flink-job-cluster-kw85v               2/2                   Running   2         15m
flink-task-manager-5cc79c5795-7mnqh   0/2     Pending   0          14m
Reply | Threaded
Open this post in threaded view
|

Re: Docker Taskmanager unable to connect to Flink JpbManager...Connection RefusedHi,

Avijit Saha
In reply to this post by Avijit Saha
Hi,
I have built a docker image containing both Flink 1.11 and the job jar as per instructions at:
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html 

The jobanager starts up fine as follows:

- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
- docker run  --env FLINK_PROPERTIES={"${FLINK_PROPERTIES}" -p 0.0.0.0:6123:6123/tcp  flink_with_job_artifacts standalone-job --job-classname=org.apache.beam.examples.WordCount --runner=FlinkRunner --inputFile=/opt/flink/conf/flink-conf.yaml  --output=/tmp/counts



Now, when launching the taskmanager as follows:
- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
-  docker run --env  FLINK_PROPERTIES="${FLINK_PROPERTIES}"   flink_with_job_artifacts taskmanager,
the taskmanager fails with the following:

.........
2020-07-22 16:55:25,974 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
........
2020-07-22 16:55:32,709 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*.


Any pointer on How to fix this?

Thanks
Avijit
Reply | Threaded
Open this post in threaded view
|

Re: Docker Taskmanager unable to connect to Flink JpbManager...Connection RefusedHi,

Yang Wang
Hi Avijit,

I think you need to create a network via "docker network create flink-network". 
And then use "docker run ... --name=jobmanager --network flink-network" to set the hostname. Also
"jobmanager.rpc.address" need to be set the jobmanager. Refer the doc[1] for more information.

If you really do not want to create a network and still want to use the port forward directly. Then you
need to set the "jobmanager.rpc.address" to a local ip address, for example, 192.168.0.100, not 127.0.0.1.
Since 127.0.0.1 in docker container means a container local address, not host machine local address.



Best,
Yang

Avijit Saha <[hidden email]> 于2020年7月23日周四 上午1:14写道:
Hi,
I have built a docker image containing both Flink 1.11 and the job jar as per instructions at:
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html 

The jobanager starts up fine as follows:

- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
- docker run  --env FLINK_PROPERTIES={"${FLINK_PROPERTIES}" -p 0.0.0.0:6123:6123/tcp  flink_with_job_artifacts standalone-job --job-classname=org.apache.beam.examples.WordCount --runner=FlinkRunner --inputFile=/opt/flink/conf/flink-conf.yaml  --output=/tmp/counts



Now, when launching the taskmanager as follows:
- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
-  docker run --env  FLINK_PROPERTIES="${FLINK_PROPERTIES}"   flink_with_job_artifacts taskmanager,
the taskmanager fails with the following:

.........
2020-07-22 16:55:25,974 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
........
2020-07-22 16:55:32,709 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*.


Any pointer on How to fix this?

Thanks
Avijit
Reply | Threaded
Open this post in threaded view
|

Re: Docker Taskmanager unable to connect to Flink JpbManager...Connection RefusedHi,

Avijit Saha
Thanks Yang!

It worked as expected after I made the changes suggested by you!

Avijit

On Wed, Jul 22, 2020 at 11:05 PM Yang Wang <[hidden email]> wrote:
Hi Avijit,

I think you need to create a network via "docker network create flink-network". 
And then use "docker run ... --name=jobmanager --network flink-network" to set the hostname. Also
"jobmanager.rpc.address" need to be set the jobmanager. Refer the doc[1] for more information.

If you really do not want to create a network and still want to use the port forward directly. Then you
need to set the "jobmanager.rpc.address" to a local ip address, for example, 192.168.0.100, not 127.0.0.1.
Since 127.0.0.1 in docker container means a container local address, not host machine local address.



Best,
Yang

Avijit Saha <[hidden email]> 于2020年7月23日周四 上午1:14写道:
Hi,
I have built a docker image containing both Flink 1.11 and the job jar as per instructions at:
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html 

The jobanager starts up fine as follows:

- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
- docker run  --env FLINK_PROPERTIES={"${FLINK_PROPERTIES}" -p 0.0.0.0:6123:6123/tcp  flink_with_job_artifacts standalone-job --job-classname=org.apache.beam.examples.WordCount --runner=FlinkRunner --inputFile=/opt/flink/conf/flink-conf.yaml  --output=/tmp/counts



Now, when launching the taskmanager as follows:
- FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1"
-  docker run --env  FLINK_PROPERTIES="${FLINK_PROPERTIES}"   flink_with_job_artifacts taskmanager,
the taskmanager fails with the following:

.........
2020-07-22 16:55:25,974 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)
........
2020-07-22 16:55:32,709 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*.


Any pointer on How to fix this?

Thanks
Avijit