(DEPRECATED) Apache Flink User Mailing List archive.

Command exited with status 1 in running Flink on marathon

Classic

List

Threaded

4 messages Options

Mar_zieh

Command exited with status 1 in running Flink on marathon

I want to run my flink program on Mesos cluster via marathon. I created an
application with this Json file in Marathon:

{
"id": "flink",
"cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
-Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081
-Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
-Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
-Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024
}

The task became failed with this error:

I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0
I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent
d9a98175-b93c-4600-a41b-fe91fae5486a-S0
I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event
I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on
172.28.10.136
I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event
I0303 09:41:52.855932 2594 executor.cpp:679] Starting task
ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af
I0303 09:41:52.868172 2594 executor.cpp:499] Running
'/home/mesos-1.7.0/build/src/mesos-containerizer launch
<POSSIBLY-SENSITIVE-DATA>'
I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599
I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1
(pid: 2599)
I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop

I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on
docker. I ran a simple program like "echo "hello" >> /home/output.txt"
without any problems.

I really do not know what is going on, I am confused. Would you please any
one tell me what is wrong here?

Any help would be appreciated.

Many thanks.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Piotr Nowojski-3

Re: Command exited with status 1 in running Flink on marathon

Hi,

With just this information it might be difficult to help.

Please look for some additional logs (has the Flink managed to log anything?) or some standard output/errors. I would guess this might be some relatively simple mistake in configuration, like file/directory read/write/execute permissions or something like that.

I guess you have seen/followed this?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html

Piotrek

> On 3 Mar 2019, at 12:46, Mar_zieh <[hidden email]> wrote:
>
> I want to run my flink program on Mesos cluster via marathon. I created an
> application with this Json file in Marathon:
>
> {
> "id": "flink",
> "cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
> -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081
> -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
> -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
> -Dmesos.resourcemanager.tasks.cpus=1",
> "cpus": 1.0,
> "mem": 1024
> }
>
> The task became failed with this error:
>
> I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0
> I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent
> d9a98175-b93c-4600-a41b-fe91fae5486a-S0
> I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event
> I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on
> 172.28.10.136
> I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event
> I0303 09:41:52.855932 2594 executor.cpp:679] Starting task
> ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af
> I0303 09:41:52.868172 2594 executor.cpp:499] Running
> '/home/mesos-1.7.0/build/src/mesos-containerizer launch
> <POSSIBLY-SENSITIVE-DATA>'
> I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599
> I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1
> (pid: 2599)
> I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop
>
> I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on
> docker. I ran a simple program like "echo "hello" >> /home/output.txt"
> without any problems.
>
> I really do not know what is going on, I am confused. Would you please any
> one tell me what is wrong here?
>
> Any help would be appreciated.
>
> Many thanks.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Piotr Nowojski-3

Re: Command exited with status 1 in running Flink on marathon

Hi,

Flink per se doesn’t require Hadoop to work, however keep in mind that you need some way to provide some kind of distributed/remote file system for checkpoint mechanism to work. If one node writes a file for checkpoint/savepoint, in case of restart/crash this file must be accessible from other nodes after the restart.

Piotrek

On 5 Mar 2019, at 10:01, marzieh ghasemi <[hidden email]> wrote:

Thank you for your reply.

Yes, I followed this link.

https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html

But I did not install Hadoop. Is problem for that? Since HDFS was commented. I did not change it.

On Mon, Mar 4, 2019 at 4:40 PM Piotr Nowojski <[hidden email]> wrote:
Hi,

With just this information it might be difficult to help.

Please look for some additional logs (has the Flink managed to log anything?) or some standard output/errors. I would guess this might be some relatively simple mistake in configuration, like file/directory read/write/execute permissions or something like that.

I guess you have seen/followed this?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html

Piotrek

> On 3 Mar 2019, at 12:46, Mar_zieh <[hidden email]> wrote:
>
> I want to run my flink program on Mesos cluster via marathon. I created an
> application with this Json file in Marathon:
>
> {
> "id": "flink",
> "cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
> -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081
> -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
> -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
> -Dmesos.resourcemanager.tasks.cpus=1",
> "cpus": 1.0,
> "mem": 1024
> }
>
> The task became failed with this error:
>
> I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0
> I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent
> d9a98175-b93c-4600-a41b-fe91fae5486a-S0
> I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event
> I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on
> 172.28.10.136
> I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event
> I0303 09:41:52.855932 2594 executor.cpp:679] Starting task
> ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af
> I0303 09:41:52.868172 2594 executor.cpp:499] Running
> '/home/mesos-1.7.0/build/src/mesos-containerizer launch
> <POSSIBLY-SENSITIVE-DATA>'
> I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599
> I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1
> (pid: 2599)
> I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop
>
> I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on
> docker. I ran a simple program like "echo "hello" >> /home/output.txt"
> without any problems.
>
> I really do not know what is going on, I am confused. Would you please any
> one tell me what is wrong here?
>
> Any help would be appreciated.
>
> Many thanks.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mar_zieh

Re: Command exited with status 1 in running Flink on marathon

Ok, thanks.

On Tue, Mar 5, 2019 at 2:43 PM Piotr Nowojski <[hidden email]> wrote:

Hi,

Flink per se doesn’t require Hadoop to work, however keep in mind that you need some way to provide some kind of distributed/remote file system for checkpoint mechanism to work. If one node writes a file for checkpoint/savepoint, in case of restart/crash this file must be accessible from other nodes after the restart.

Piotrek

On 5 Mar 2019, at 10:01, marzieh ghasemi <[hidden email]> wrote:

Thank you for your reply.

Yes, I followed this link.

https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html

But I did not install Hadoop. Is problem for that? Since HDFS was commented. I did not change it.

On Mon, Mar 4, 2019 at 4:40 PM Piotr Nowojski <[hidden email]> wrote:
Hi,

With just this information it might be difficult to help.

Please look for some additional logs (has the Flink managed to log anything?) or some standard output/errors. I would guess this might be some relatively simple mistake in configuration, like file/directory read/write/execute permissions or something like that.

I guess you have seen/followed this?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html

Piotrek

> On 3 Mar 2019, at 12:46, Mar_zieh <[hidden email]> wrote:
>
> I want to run my flink program on Mesos cluster via marathon. I created an
> application with this Json file in Marathon:
>
> {
> "id": "flink",
> "cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
> -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081
> -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
> -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
> -Dmesos.resourcemanager.tasks.cpus=1",
> "cpus": 1.0,
> "mem": 1024
> }
>
> The task became failed with this error:
>
> I0303 09:41:52.841243 2594 exec.cpp:162] Version: 1.7.0
> I0303 09:41:52.851898 2593 exec.cpp:236] Executor registered on agent
> d9a98175-b93c-4600-a41b-fe91fae5486a-S0
> I0303 09:41:52.854436 2594 executor.cpp:182] Received SUBSCRIBED event
> I0303 09:41:52.855284 2594 executor.cpp:186] Subscribed executor on
> 172.28.10.136
> I0303 09:41:52.855479 2594 executor.cpp:182] Received LAUNCH event
> I0303 09:41:52.855932 2594 executor.cpp:679] Starting task
> ffff.933fdd2f-3d98-11e9-bbc4-0242a78449af
> I0303 09:41:52.868172 2594 executor.cpp:499] Running
> '/home/mesos-1.7.0/build/src/mesos-containerizer launch
> <POSSIBLY-SENSITIVE-DATA>'
> I0303 09:41:52.872699 2594 executor.cpp:693] Forked command at 2599
> I0303 09:41:54.050284 2596 executor.cpp:994] Command exited with status 1
> (pid: 2599)
> I0303 09:41:55.052323 2598 process.cpp:926] Stopped the socket accept loop
>
> I configured Zookeeper, Mesos, Marathon and Flink. Moreover, they are all on
> docker. I ran a simple program like "echo "hello" >> /home/output.txt"
> without any problems.
>
> I really do not know what is going on, I am confused. Would you please any
> one tell me what is wrong here?
>
> Any help would be appreciated.
>
> Many thanks.
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/