Flink with Mesos: Fetcher error

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink with Mesos: Fetcher error

ani.desh1512
I am trying to configure Flink to work on top of Mesos. I am using Flink release-1.3. I am using DCOS 1.9's underlying mesos which is version 1.2. I am able to start Flink without any issues when the taskmanager starts on the same host as that of appmaster. But when the taskmanager is launched on a different host, the container fails to launch. The flink mesos-appmaster log is something as follows:

2017-06-08 19:19:01,537 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00003 on host 10.101.2.117.
2017-06-08 19:19:01,550 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00002 on host 10.101.2.117.
2017-06-08 19:19:01,607 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00001 on host 10.101.2.117.
2017-06-08 19:19:01,623 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00004 on host 10.101.2.117.
2017-06-08 19:19:01,645 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00006 on host 10.101.2.91.
2017-06-08 19:19:01,660 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00005 on host 10.101.2.91.
2017-06-08 19:19:01,674 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Launching Mesos task taskmanager-00007 on host 10.101.2.91.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor                  - Mesos task taskmanager-00003 failed unexpectedly.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor                  - Mesos task taskmanager-00002 failed unexpectedly.
2017-06-08 19:19:02,245 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Mesos task taskmanager-00002 failed, with a TaskManager in launch or registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED (Failed to launch container: Failed to fetch all URIs for container '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256)
2017-06-08 19:19:02,246 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Diagnostics for task taskmanager-00002 in state TASK_FAILED : reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: Failed to fetch all URIs for container '125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256
2017-06-08 19:19:02,247 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Total number of failed tasks so far: 1
2017-06-08 19:19:02,252 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Mesos task taskmanager-00003 failed, with a TaskManager in launch or registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED (Failed to launch container: Failed to fetch all URIs for container '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256)
2017-06-08 19:19:02,252 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Diagnostics for task taskmanager-00003 in state TASK_FAILED : reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: Failed to fetch all URIs for container '69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256
2017-06-08 19:19:02,252 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Total number of failed tasks so far: 2
2017-06-08 19:19:02,313 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Scheduling Mesos task taskmanager-00008 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,330 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Scheduling Mesos task taskmanager-00009 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,331 INFO  org.apache.flink.mesos.scheduler.LaunchCoordinator            - Now gathering offers for at least 2 task(s).
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor                  - Mesos task taskmanager-00004 failed unexpectedly.
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor                  - Mesos task taskmanager-00001 failed unexpectedly.
2017-06-08 19:19:02,412 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Mesos task taskmanager-00004 failed, with a TaskManager in launch or registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED (Failed to launch container: Failed to fetch all URIs for container 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256)
2017-06-08 19:19:02,412 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Diagnostics for task taskmanager-00004 in state TASK_FAILED : reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container: Failed to fetch all URIs for container 'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256
2017-06-08 19:19:02,412 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Total number of failed tasks so far: 3
2017-06-08 19:19:02,432 INFO  org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  - Mesos task taskmanager-00001 failed, with a TaskManager in launch or registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED (Failed to launch container: Failed to fetch all URIs for container '325e14fe-8840-4996-96dc-5c7ffc159d12' with exit status: 256)


I checked the stderr in Mesos sandbox and it is as follows:

I0608 19:20:06.184386 30480 fetcher.cpp:531] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/flink","items":[{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/yarn-session.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/yarn-session.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-console.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-console.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/log4j-1.2.17.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/log4j-1.2.17.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-appmaster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-appmaster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-scala-shell.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-scala-shell.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-yarn-session.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-yarn-session.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-yarn.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-yarn.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-daemon.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-daemon.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/zookeeper.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/zookeeper.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-console.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-console.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/masters","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/masters"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/conf\/flink-conf.yaml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/flink-conf.yaml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/zoo.cfg","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/zoo.cfg"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/slaves","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/slaves"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/slf4j-log4j12-1.7.7.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/slf4j-log4j12-1.7.7.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-cli.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-cli.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/historyserver.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/historyserver.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/jobmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/jobmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-console.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-console.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/config.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/config.sh"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/frameworks\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030\/executors\/taskmanager-00009\/runs\/d8d1756d-f977-43f6-a53f-55c19b6c6294","user":"flink"}
I0608 19:20:06.189909 30480 fetcher.cpp:442] Fetching URI 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.189932 30480 fetcher.cpp:283] Fetching directly into the sandbox directory
I0608 19:20:06.190213 30480 fetcher.cpp:220] Fetching URI 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.190251 30480 fetcher.cpp:163] Downloading resource from 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh' to '/var/lib/mesos/slave/slaves/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6/frameworks/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030/executors/taskmanager-00009/runs/d8d1756d-f977-43f6-a53f-55c19b6c6294/flink/bin/mesos-taskmanager.sh'
Failed to fetch 'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh': Error downloading resource: Couldn't connect to server
Failed to synchronize with agent (it's probably exited)


So, my question is what am I missing?
Will I need to mention some special URI in marathon for flink? I am setting mesos.master as zk://leader.mesos:2181/mesos. Is this the one that is creating problem?
Or, have I missed some mesos or marathon setting?
Also, I am launching this via Marathon and I have the same flink dist at same path in all the slaves

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Flink with Mesos: Fetcher error

Till Rohrmann

Hi Ani,

the problem is that you have to set a reachable jobmanager hostname in the flink-conf.yaml via jobmanager.rpc.address: [reachable hostname]. I assume that you use the default value which is localhost. You can see it in the fetcher info where the URL for the different files points to localhost:38985. Setting this value to the external hostname on which the JobManager is running should solve the problem.

Cheers,
Till


On Thu, Jun 8, 2017 at 11:21 PM, ani.desh1512 <[hidden email]> wrote:
I am trying to configure Flink to work on top of Mesos. I am using Flink
release-1.3. I am using DCOS 1.9's underlying mesos which is version 1.2. I
am able to start Flink without any issues when the taskmanager starts on the
same host as that of appmaster. But when the taskmanager is launched on a
different host, the container fails to launch. The flink mesos-appmaster log
is something as follows:

/2017-06-08 19:19:01,537 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00003 on host 10.101.2.117.
2017-06-08 19:19:01,550 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00002 on host 10.101.2.117.
2017-06-08 19:19:01,607 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00001 on host 10.101.2.117.
2017-06-08 19:19:01,623 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00004 on host 10.101.2.117.
2017-06-08 19:19:01,645 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00006 on host 10.101.2.91.
2017-06-08 19:19:01,660 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00005 on host 10.101.2.91.
2017-06-08 19:19:01,674 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Launching Mesos task taskmanager-00007 on host 10.101.2.91.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
- Mesos task taskmanager-00003 failed unexpectedly.
2017-06-08 19:19:02,234 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
- Mesos task taskmanager-00002 failed unexpectedly.
2017-06-08 19:19:02,245 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00002 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256)
2017-06-08 19:19:02,246 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00002 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'125055b6-9a19-4d62-a019-5d8a4197c043' with exit status: 256
2017-06-08 19:19:02,247 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 1
2017-06-08 19:19:02,252 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00003 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256)
2017-06-08 19:19:02,252 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00003 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'69259a92-b3e4-44c7-9afd-3ac650524570' with exit status: 256
2017-06-08 19:19:02,252 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 2
2017-06-08 19:19:02,313 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Scheduling Mesos task taskmanager-00008 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,330 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Scheduling Mesos task taskmanager-00009 with (2048.0 MB, 1.0 cpus).
2017-06-08 19:19:02,331 INFO
org.apache.flink.mesos.scheduler.LaunchCoordinator            - Now
gathering offers for at least 2 task(s).
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
- Mesos task taskmanager-00004 failed unexpectedly.
2017-06-08 19:19:02,332 WARN  org.apache.flink.mesos.scheduler.TaskMonitor
- Mesos task taskmanager-00001 failed unexpectedly.
2017-06-08 19:19:02,412 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00004 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256)
2017-06-08 19:19:02,412 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Diagnostics for task taskmanager-00004 in state TASK_FAILED :
reason=REASON_CONTAINER_LAUNCH_FAILED message=Failed to launch container:
Failed to fetch all URIs for container
'a65c3e35-579d-4302-830f-be50b6d0ca06' with exit status: 256
2017-06-08 19:19:02,412 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Total number of failed tasks so far: 3
2017-06-08 19:19:02,432 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager  -
Mesos task taskmanager-00001 failed, with a TaskManager in launch or
registration. State: TASK_FAILED Reason: REASON_CONTAINER_LAUNCH_FAILED
(Failed to launch container: Failed to fetch all URIs for container
'325e14fe-8840-4996-96dc-5c7ffc159d12' with exit status: 256)/

I checked the stderr in Mesos sandbox and it is as follows:

/I0608 19:20:06.184386 30480 fetcher.cpp:531] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/flink","items":[{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/yarn-session.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/yarn-session.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-console.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-console.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/log4j-1.2.17.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/log4j-1.2.17.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/mesos-appmaster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/mesos-appmaster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/taskmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/taskmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/stop-cluster.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/stop-cluster.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-scala-shell.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-scala-shell.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-yarn-session.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-yarn-session.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-yarn.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-yarn.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-daemon.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-daemon.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/zookeeper.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/zookeeper.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback-console.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback-console.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/masters","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/masters"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/conf\/flink-conf.yaml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/flink-conf.yaml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/zoo.cfg","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/zoo.cfg"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-shaded-hadoop2-uber-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/slaves","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/slaves"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-dist_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/slf4j-log4j12-1.7.7.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/slf4j-log4j12-1.7.7.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/log4j-cli.properties","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/log4j-cli.properties"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/historyserver.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/historyserver.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/lib\/flink-python_2.10-1.3-SNAPSHOT.jar"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":false,"extract":false,"output_file":"flink\/conf\/logback.xml","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/conf\/logback.xml"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/pyflink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/pyflink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-local.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-local.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink.bat","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink.bat"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/start-zookeeper-quorum.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/start-zookeeper-quorum.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/jobmanager.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/jobmanager.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/flink-console.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/flink-console.sh"}},{"action":"BYPASS_CACHE","uri":{"cache":true,"executable":true,"extract":false,"output_file":"flink\/bin\/config.sh","value":"http:\/\/localhost:38985\/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78\/flink\/bin\/config.sh"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6\/frameworks\/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030\/executors\/taskmanager-00009\/runs\/d8d1756d-f977-43f6-a53f-55c19b6c6294","user":"flink"}
I0608 19:20:06.189909 30480 fetcher.cpp:442] Fetching URI
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.189932 30480 fetcher.cpp:283] Fetching directly into the
sandbox directory
I0608 19:20:06.190213 30480 fetcher.cpp:220] Fetching URI
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
I0608 19:20:06.<a href="tel:190251%2030480" value="+19025130480">190251 30480 fetcher.cpp:163] Downloading resource from
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh'
to
'/var/lib/mesos/slave/slaves/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-S6/frameworks/6b7667c0-1b1a-43a4-ba1f-27cb0660608f-0030/executors/taskmanager-00009/runs/d8d1756d-f977-43f6-a53f-55c19b6c6294/flink/bin/mesos-taskmanager.sh'
Failed to fetch
'http://localhost:38985/567dfcb8-f7d7-4d53-8518-53c1b3e7ef78/flink/bin/mesos-taskmanager.sh':
Error downloading resource: Couldn't connect to server
Failed to synchronize with agent (it's probably exited)/

So, my question is what am I missing?
Will I need to mention some special URI in marathon for flink? I am setting
mesos.master as /zk://leader.mesos:2181/mesos/. Is this the one that is
creating problem?
Or, have I missed some mesos or marathon setting?
Also, I am launching this via Marathon and I have the same flink dist at
same path in all the slaves

Thanks,



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-with-Mesos-Fetcher-error-tp13603.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.