I want to run my flink program on Mesos cluster via marathon. I created an
application with this Json file in Marathon: { "id": "flink", "cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, "mem": 1024 } The task became failed with this error: I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0 I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent d9a98175-b93c-4600-a41b-fe91fae5486a-S0 I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on 172.28.10.136 I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event I0303 09:41:52.855932 2594 executor.cpp:679] Starting task ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af I0303 09:41:52.868172 2594 executor.cpp:499] Running '/home/mesos-1.7.0/build/src/mesos-containerizer launch <POSSIBLY-SENSITIVE-DATA>' I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599 I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1 (pid: 2599) I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on docker. I ran a simple program like "echo "hello" >> /home/output.txt" without any problems. I really do not know what is going on, I am confused. Would you please any one tell me what is wrong here? Any help would be appreciated. Many thanks. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi,
With just this information it might be difficult to help. Please look for some additional logs (has the Flink managed to log anything?) or some standard output/errors. I would guess this might be some relatively simple mistake in configuration, like file/directory read/write/execute permissions or something like that. I guess you have seen/followed this? https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html Piotrek > On 3 Mar 2019, at 12:46, Mar_zieh <[hidden email]> wrote: > > I want to run my flink program on Mesos cluster via marathon. I created an > application with this Json file in Marathon: > > { > "id": "flink", > "cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh > -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081 > -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 > -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 > -Dmesos.resourcemanager.tasks.cpus=1", > "cpus": 1.0, > "mem": 1024 > } > > The task became failed with this error: > > I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0 > I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent > d9a98175-b93c-4600-a41b-fe91fae5486a-S0 > I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event > I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on > 172.28.10.136 > I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event > I0303 09:41:52.855932 2594 executor.cpp:679] Starting task > ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af > I0303 09:41:52.868172 2594 executor.cpp:499] Running > '/home/mesos-1.7.0/build/src/mesos-containerizer launch > <POSSIBLY-SENSITIVE-DATA>' > I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599 > I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1 > (pid: 2599) > I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop > > I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on > docker. I ran a simple program like "echo "hello" >> /home/output.txt" > without any problems. > > I really do not know what is going on, I am confused. Would you please any > one tell me what is wrong here? > > Any help would be appreciated. > > Many thanks. > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi,
Flink per se doesn’t require Hadoop to work, however keep in mind that you need some way to provide some kind of distributed/remote file system for checkpoint mechanism to work. If one node writes a file for checkpoint/savepoint, in case of restart/crash this file must be accessible from other nodes after the restart. Piotrek
|
Ok, thanks. On Tue, Mar 5, 2019 at 2:43 PM Piotr Nowojski <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |