Hi,
I have a docker image of the Beam WordCount example that reads a status file and produces a output one time with word counts etc. This runs fine as a separate job-manager and task-manager when run from docker-compose locally. Now, I am trying to deploy and run this on my Kubernetes cluster as per instructions at https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes. Deployment of Job-cluster and task-manager goes thro fine but the task-manager never seems to be picked up - it always stays in 'Pending' status! Is this expected behavior for a one-time Job like Word-Count application or am I missing something? Thanks Avijit $ kubectl get pods NAME READY STATUS RESTARTS AGE flink-job-cluster-kw85v 2/2 Running 2 15m flink-task-manager-5cc79c5795-7mnqh 0/2 Pending 0 14m |
Hi Avijit Saha, I think you could use 'kubectl describe pod flink-task-manager-5cc79c5795-7mnqh' to get more information. Usually, it is caused by no enough resource in your K8s cluster. Best, Yang Avijit Saha <[hidden email]> 于2020年7月14日周二 上午7:12写道:
|
In reply to this post by Avijit Saha
Hi,
I have built a docker image containing both Flink 1.11 and the job jar as per instructions at: https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html The jobanager starts up fine as follows: - FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1" - docker run --env FLINK_PROPERTIES={"${FLINK_PROPERTIES}" -p 0.0.0.0:6123:6123/tcp flink_with_job_artifacts standalone-job --job-classname=org.apache.beam.examples.WordCount --runner=FlinkRunner --inputFile=/opt/flink/conf/flink-conf.yaml --output=/tmp/counts Now, when launching the taskmanager as follows: - FLINK_PROPERTIES="jobmanager.rpc.address: 127.0.0.1" - docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_with_job_artifacts taskmanager, the taskmanager fails with the following: ......... 2020-07-22 16:55:25,974 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused) ........ 2020-07-22 16:55:32,709 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_*. Any pointer on How to fix this? Thanks Avijit |
Hi Avijit, I think you need to create a network via "docker network create flink-network". And then use "docker run ... --name=jobmanager --network flink-network" to set the hostname. Also "jobmanager.rpc.address" need to be set the jobmanager. Refer the doc[1] for more information. If you really do not want to create a network and still want to use the port forward directly. Then you need to set the "jobmanager.rpc.address" to a local ip address, for example, 192.168.0.100, not 127.0.0.1. Since 127.0.0.1 in docker container means a container local address, not host machine local address. Best, Yang Avijit Saha <[hidden email]> 于2020年7月23日周四 上午1:14写道:
|
Thanks Yang! It worked as expected after I made the changes suggested by you! Avijit On Wed, Jul 22, 2020 at 11:05 PM Yang Wang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |