Unable to start session cluster using Docker

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to start session cluster using Docker

Vinay Patil
Hi,

I have used the docker-compose file for creating the cluster as shown in the documentation. The web ui is started successfully, however, the task managers are unable to join.

Job Manager container logs:

018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at cluster:8081

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://cluster:8081

2018-10-04 18:13:14,012 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000

2018-10-04 18:13:14,013 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.

2018-10-04 18:13:14,026 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000


Not sure why it says Web Frontend listening at cluster:8081 when the job manager rpc address is specified to jobmanager

Task Manager Container Logs:

018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager(00000000000000000000000000000000).

2018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5

2018-10-04 18:19:28,850 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".



I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in docker-compose file, it does not work.
Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.

Can you please let me know what is the issue here.

Regards,
Vinay Patil
Reply | Threaded
Open this post in threaded view
|

Re: Unable to start session cluster using Docker

Till Rohrmann
Hi Vinay,

are you referring to flink-contrib/docker-flink/docker-compose.yml? We recently fixed the command line parsing with Flink 1.5.4 and 1.6.1. Due to this, the removal of the second command line parameter intended to be introduced with 1.5.0 and 1.6.0 (see https://issues.apache.org/jira/browse/FLINK-8696) became visible. The docker-compose.yml file has not yet been updated. I will do this right away and push the changes to the 1.5, 1.6 and master branch. Sorry for the inconveniences. As a local fix for you, please go to flink-contrib/docker-flink/docker-entrypoint.sh:33 and remove the cluster parameter from this line.

Cheers,
Till

On Thu, Oct 4, 2018 at 8:30 PM Vinay Patil <[hidden email]> wrote:
Hi,

I have used the docker-compose file for creating the cluster as shown in the documentation. The web ui is started successfully, however, the task managers are unable to join.

Job Manager container logs:

018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at cluster:8081

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://cluster:8081

2018-10-04 18:13:14,012 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000

2018-10-04 18:13:14,013 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.

2018-10-04 18:13:14,026 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000


Not sure why it says Web Frontend listening at cluster:8081 when the job manager rpc address is specified to jobmanager

Task Manager Container Logs:

018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager(00000000000000000000000000000000).

2018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5

2018-10-04 18:19:28,850 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".



I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in docker-compose file, it does not work.
Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.

Can you please let me know what is the issue here.

Regards,
Vinay Patil
Reply | Threaded
Open this post in threaded view
|

Re: Unable to start session cluster using Docker

Vinay Patil

Thank you Till, I am able to start the session-cluster now. 

Regards,
Vinay Patil


On Fri, Oct 5, 2018 at 8:15 PM Till Rohrmann <[hidden email]> wrote:
Hi Vinay,

are you referring to flink-contrib/docker-flink/docker-compose.yml? We recently fixed the command line parsing with Flink 1.5.4 and 1.6.1. Due to this, the removal of the second command line parameter intended to be introduced with 1.5.0 and 1.6.0 (see https://issues.apache.org/jira/browse/FLINK-8696) became visible. The docker-compose.yml file has not yet been updated. I will do this right away and push the changes to the 1.5, 1.6 and master branch. Sorry for the inconveniences. As a local fix for you, please go to flink-contrib/docker-flink/docker-entrypoint.sh:33 and remove the cluster parameter from this line.

Cheers,
Till

On Thu, Oct 4, 2018 at 8:30 PM Vinay Patil <[hidden email]> wrote:
Hi,

I have used the docker-compose file for creating the cluster as shown in the documentation. The web ui is started successfully, however, the task managers are unable to join.

Job Manager container logs:

018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at cluster:8081

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://cluster:8081

2018-10-04 18:13:14,012 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000

2018-10-04 18:13:14,013 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.

2018-10-04 18:13:14,026 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000


Not sure why it says Web Frontend listening at cluster:8081 when the job manager rpc address is specified to jobmanager

Task Manager Container Logs:

018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager(00000000000000000000000000000000).

2018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5

2018-10-04 18:19:28,850 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".



I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in docker-compose file, it does not work.
Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.

Can you please let me know what is the issue here.

Regards,
Vinay Patil