Hello Flink team,
We use Flink on DCOS and have problems submitting a Flink job from within a container to the Flink cluster. Both the container and the Flink cluster are running inside DCOS, on different nodes.
We have the following setup: Flink was installed on DCOS using the package from the catalog. According to the Flink UI ([DCOS-URL]/service/flink/) the Flink job manager settings are:
jobmanager.rpc.address ip-10-0-1-95.eu-central-1.compute.internal jobmanager.rpc.port 14503 jobmanager.web.port 14502 mesos.artifact-server.port 14505
where "ip-10-0-1-95.eu-central-1.compute.internal" is the host name of the DCOS node with IP 10.0.1.95 on which the container with the job manager is running.
Furthermore for both the job manager RPC port and the job manager web port a VIP is configured:
job manager RPC port: flink.marathon.l4lb.thisdcos.directory:6123 job manager Web port: flink.marathon.l4lb.thisdcos.directory:8081
Now if we try to submit a Flink job to the job manager via the Flink cli performing the following steps: 1) log into the DCOS master node: dcos node ssh --leader --master-proxy 2) start an interactive session inside a Docker container using the Mesosphere Flink image: docker run --rm -it mesosphere/dcos-flink:1.4.2-1.0 /bin/bash 3) submit a Flink job to the Flink job manager: cd /flink-1.4.2 ./bin/flink run -m ip-10-0-1-95.eu-central-1.compute.internal:14503 examples/streaming/WordCount.jar
everything works fine. The job appears as an entry within the Flink UI and we get the results we expect.
But if we try to submit the same job to the job manager using the VIP of the job manager flink.marathon.l4lb.thisdcos.directory:6123 using:
./bin/flink run -m flink.marathon.l4lb.thisdcos.directory:6123 examples/streaming/WordCount.jar
or if we try to submit the job to the job manager using the IP of the DCOS node instead of its host name:
./bin/flink run -m 10.0.1.95:14503 examples/streaming/WordCount.jar
the job can not be submitted. Apparently the connection to the job manager can not be established and nothing appears within the Flink UI. You can find the output in attachment. Submitting to the jobmanager using the URL from Mesos DNS is also not working.
Why this is not working or why we can only submit jobs using the hostname (ip-10-0-1-95.eu-central-1.compute.internal) of the job manager and not the IP or the VIP? Thank you! Best regards Wei output.txt (5K) Download Attachment |
Free forum by Nabble | Edit this page |