Flink on Mesos: containers question

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on Mesos: containers question

NEKRASSOV, ALEXEI

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

Reply | Threaded
Open this post in threaded view
|

Re: Flink on Mesos: containers question

Fabian Hueske-2
Hi Alexei,

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

Best, Fabian

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex


Reply | Threaded
Open this post in threaded view
|

RE: Flink on Mesos: containers question

NEKRASSOV, ALEXEI

Till,

 

Any insight into how Flink components are containerized in Mesos?

 

Thanks!

Alex

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Monday, July 16, 2018 7:57 AM
To: NEKRASSOV, ALEXEI <[hidden email]>
Cc: [hidden email]; Till Rohrmann <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei,

 

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

 

Best, Fabian

 

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink on Mesos: containers question

Till Rohrmann
Hi Alexei,

I actually never used Mesos with container images. I always used it in a way where the Mesos task directly starts the Java process.

Cheers,
Till

On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <[hidden email]> wrote:

Till,

 

Any insight into how Flink components are containerized in Mesos?

 

Thanks!

Alex

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Monday, July 16, 2018 7:57 AM
To: NEKRASSOV, ALEXEI <[hidden email]>
Cc: [hidden email]; Till Rohrmann <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei,

 

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

 

Best, Fabian

 

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink on Mesos: containers question

Renjie Liu
Hi, Alexei:

What you paste is expected behavior. Jobmanager, two task managers each should run in a docker instance.

13276 is should be the process of job manager, and it's the same process as 789. They have different processes id because in show them in different namesapces(that's a concept in cgroup, which docker actually dependens on).

On Thu, Jul 19, 2018 at 10:00 PM Till Rohrmann <[hidden email]> wrote:
Hi Alexei,

I actually never used Mesos with container images. I always used it in a way where the Mesos task directly starts the Java process.

Cheers,
Till

On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <[hidden email]> wrote:

Till,

 

Any insight into how Flink components are containerized in Mesos?

 

Thanks!

Alex

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Monday, July 16, 2018 7:57 AM
To: NEKRASSOV, ALEXEI <[hidden email]>
Cc: [hidden email]; Till Rohrmann <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei,

 

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

 

Best, Fabian

 

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

 

--
Liu, Renjie
Software Engineer, MVAD
Reply | Threaded
Open this post in threaded view
|

RE: Flink on Mesos: containers question

NEKRASSOV, ALEXEI

Renjie,

 

In my observation Task Managers don’t run in Docker containers – they run as JVM processes directly on the VM.

The only Docker container is the one that runs Job Manager.

 

What am I missing?

 

Thanks,

Alex

 

From: Renjie Liu [mailto:[hidden email]]
Sent: Friday, July 20, 2018 8:56 PM
To: Till Rohrmann <[hidden email]>
Cc: NEKRASSOV, ALEXEI <[hidden email]>; Fabian Hueske <[hidden email]>; user <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei:

 

What you paste is expected behavior. Jobmanager, two task managers each should run in a docker instance.

 

13276 is should be the process of job manager, and it's the same process as 789. They have different processes id because in show them in different namesapces(that's a concept in cgroup, which docker actually dependens on).

 

On Thu, Jul 19, 2018 at 10:00 PM Till Rohrmann <[hidden email]> wrote:

Hi Alexei,

 

I actually never used Mesos with container images. I always used it in a way where the Mesos task directly starts the Java process.

 

Cheers,

Till

 

On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <[hidden email]> wrote:

Till,

 

Any insight into how Flink components are containerized in Mesos?

 

Thanks!

Alex

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Monday, July 16, 2018 7:57 AM
To: NEKRASSOV, ALEXEI <[hidden email]>
Cc: [hidden email]; Till Rohrmann <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei,

 

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

 

Best, Fabian

 

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

 

--

Liu, Renjie

Software Engineer, MVAD

Reply | Threaded
Open this post in threaded view
|

Re: Flink on Mesos: containers question

Renjie Liu
Hi:
As I said the docker process and job manager process are the same one.

To start task manager in docker, you need to specify in the job master config "mesos.resourcemanager.tasks.container.type" to "docker", otherwise flink will just start task manager as processes.

I don't understand what do you mean that you can't access vm's filesystem.

On Tue, Jul 31, 2018 at 2:25 AM NEKRASSOV, ALEXEI <[hidden email]> wrote:

Renjie,

 

In my observation Task Managers don’t run in Docker containers – they run as JVM processes directly on the VM.

The only Docker container is the one that runs Job Manager.

 

What am I missing?

 

Thanks,

Alex

 

From: Renjie Liu [mailto:[hidden email]]
Sent: Friday, July 20, 2018 8:56 PM
To: Till Rohrmann <[hidden email]>
Cc: NEKRASSOV, ALEXEI <[hidden email]>; Fabian Hueske <[hidden email]>; user <[hidden email]>


Subject: Re: Flink on Mesos: containers question

 

Hi Alexei:

 

What you paste is expected behavior. Jobmanager, two task managers each should run in a docker instance.

 

13276 is should be the process of job manager, and it's the same process as 789. They have different processes id because in show them in different namesapces(that's a concept in cgroup, which docker actually dependens on).

 

On Thu, Jul 19, 2018 at 10:00 PM Till Rohrmann <[hidden email]> wrote:

Hi Alexei,

 

I actually never used Mesos with container images. I always used it in a way where the Mesos task directly starts the Java process.

 

Cheers,

Till

 

On Thu, Jul 19, 2018 at 2:44 PM NEKRASSOV, ALEXEI <[hidden email]> wrote:

Till,

 

Any insight into how Flink components are containerized in Mesos?

 

Thanks!

Alex

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Monday, July 16, 2018 7:57 AM
To: NEKRASSOV, ALEXEI <[hidden email]>
Cc: [hidden email]; Till Rohrmann <[hidden email]>
Subject: Re: Flink on Mesos: containers question

 

Hi Alexei,

 

Till (in CC) is familiar with Flink's Mesos support in 1.4.x.

 

Best, Fabian

 

2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI <[hidden email]>:

Can someone please clarify how Flink on Mesos in containerized?

 

On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. Mesos shows “flink” task and two “taskmanager” tasks, all on the same VM.

On that VM I see one Docker container running a process that seems to be Mesos App Master:

 

$ docker ps -a

CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS               NAMES

97b6840466c0        mesosphere/dcos-flink:1.4.2-1.0   "/bin/sh -c /sbin/..."   41 hours ago        Up 41 hours                             mesos-a0079d85-9ccb-4c43-8d31-e6b1ad750197

$ docker exec 97b6840466c0 /bin/ps -efww

UID        PID  PPID  C STIME TTY          TIME CMD

root         1     0  0 Jul11 ?        00:00:00 /bin/sh -c /sbin/init.sh

root         7     1  0 Jul11 ?        00:00:02 runsvdir -P /etc/service

root         8     7  0 Jul11 ?        00:00:00 runsv flink

root       629     0  0 Jul12 pts/0    00:00:00 /bin/bash

root       789     8  1 Jul12 ?        00:09:16 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

root      1027     0  0 12:54 ?        00:00:00 /bin/ps -efww

 

Then on the VM itself I see another process with the same command line as the one in the container:

 

root     13276  9689  1 Jul12 ?        00:09:18 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -classpath /flink-1.4.2/lib/flink-python_2.11-1.4.2.jar:/flink-1.4.2/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/flink-1.4.2/lib/log4j-1.2.17.jar:/flink-1.4.2/lib/slf4j-log4j12-1.7.7.jar:/flink-1.4.2/lib/flink-dist_2.11-1.4.2.jar::/etc/hadoop/conf/: -Dlog.file=/mnt/mesos/sandbox/flink--mesos-appmaster-alex-tfc87d-private-agents-3.novalocal.log -Dlog4j.configuration=file:/flink-1.4.2/conf/log4j.properties -Dlogback.configurationFile=file:/flink-1.4.2/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosApplicationMasterRunner -Dblob.server.port=23170 -Djobmanager.heap.mb=256 -Djobmanager.rpc.port=23169 -Djobmanager.web.port=23168 -Dmesos.artifact-server.port=23171 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.numberOfTaskSlots=1 -Dparallelism.default=1 -Djobmanager.rpc.address=localhost -Dmesos.resourcemanager.framework.role=* -Dsecurity.kerberos.login.use-ticket-cache=true

 

And I see two processes on the VM that seem to be related to Task Managers:

 

root     13688 13687  0 Jul12 ?        00:04:25 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1027 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1028 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

root     13892 13891  0 Jul12 ?        00:04:15 /docker-java-home/jre/bin/java -Xms1448m -Xmx1448m -classpath /mnt/mesos/sandbox/flink/lib/flink-python_2.11-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/flink-shaded-hadoop2-uber-1.4.2.jar:/mnt/mesos/sandbox/flink/lib/log4j-1.2.17.jar:/mnt/mesos/sandbox/flink/lib/slf4j-log4j12-1.7.7.jar:/mnt/mesos/sandbox/flink/lib/flink-dist_2.11-1.4.2.jar::: -Dlog.file=flink-taskmanager.log -Dlog4j.configuration=file:/mnt/mesos/sandbox/flink/conf/log4j.properties -Dlogback.configurationFile=file:/mnt/mesos/sandbox/flink/conf/logback.xml org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager -Dblob.server.port=23170 -Dmesos.artifact-server.port=23171 -Djobmanager.heap.mb=256 -Djobmanager.rpc.address=localhost -Djobmanager.web.port=23168 -Dsecurity.kerberos.login.use-ticket-cache=true -Djobmanager.rpc.port=23169 -Dtaskmanager.memory.preallocate=true -Dtaskmanager.rpc.port=1025 -Dmesos.initial-tasks=2 -Dmesos.resourcemanager.tasks.cpus=2 -Dtaskmanager.maxRegistrationDuration=5 minutes -Dtaskmanager.data.port=1026 -Dparallelism.default=1 -Dtaskmanager.numberOfTaskSlots=1 -Dmesos.resourcemanager.tasks.mem=2048 -Dtaskmanager.heap.mb=512 -Dmesos.resourcemanager.framework.role=*

 

But I don’t see any containers for Task Managers.

 

I thought maybe Task Managers run directly on the VM (PID’s 13688, 13892), but my code executed in Task Managers have no access to VM’s filesystem.

 

It is almost like there are more containers running than “docker ps” is showing me. Can someone clarify?

Also, what is the relationship between PID 13276  and the process that I see in the container (the two processes with the same command line)?

 

Thanks!

Alex

 

--

Liu, Renjie

Software Engineer, MVAD

--
Liu, Renjie
Software Engineer, MVAD