(DEPRECATED) Apache Flink User Mailing List archive.

HA Mode and standalone containers compatibility ?

Classic

List

Threaded

12 messages Options

LINZ, Arnaud

HA Mode and standalone containers compatibility ?

Hello,

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

Am I missing something ?

Best regards,

Aranud

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Till Rohrmann

Re: HA Mode and standalone containers compatibility ?

Hi Arnaud,

as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?

Cheers,

Till

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

Am I missing something ?

Best regards,

Aranud

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.

De : Till Rohrmann [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 10:36
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

Hi Arnaud,

Cheers,

Till

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

Am I missing something ?

Best regards,

Aranud

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Ufuk Celebi

Re: HA Mode and standalone containers compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>
> De : Till Rohrmann [mailto:[hidden email]]
> Envoyé : jeudi 3 décembre 2015 10:36
> À : [hidden email]
> Objet : Re: HA Mode and standalone containers compatibility ?
>
> Hi Arnaud,
>
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>
> Cheers,
> Till
>
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>
>
> For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"
>
>
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>
>
> Am I missing something ?
>
>
>
> Best regards,
>
> Aranud
>
>
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

In reply to this post by Till Rohrmann

More details :

Command =

/usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 5120 -yqu batch1 -ys 4 --class com.bouygtel.kubera.main.segstage.MainGeoSegStage /home/voyager/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT-allinone.jar -j /home/voyager/KBR/GOS/log -c /home/voyager/KBR/GOS/cfg/KBR_GOS_Config.cfg

Start of trace is :

Found YARN properties file /tmp/.yarn-properties-voyager

YARN properties set default parallelism to 24

Using JobManager address from YARN properties bt1shli3.bpa.bouyguestelecom.fr/172.21.125.28:36700

YARN cluster mode detected. Switching Log4j output to console

Content of /tmp/.yarn-properties-voyager

Is related to the streaming cluster :

#Generated YARN properties file

#Thu Dec 03 11:03:06 CET 2015

parallelism=24

dynamicPropertiesString=yarn.heap-cutoff-ratio\=0.6@@yarn.application-attempts\=10@@recovery.mode\=zookeeper@@recovery.zookeeper.quorum\=h1r1en01\:2181@@recovery.zookeeper.path.root\=/flink@@state.backend\=filesystem@@state.backend.fs.checkpointdir\=hdfs\:///tmp/flink/checkpoints@@recovery.zookeeper.storageDir\=hdfs\:///tmp/flink/recovery/

jobManager=172.21.125.28\:36700

De : LINZ, Arnaud
Envoyé : jeudi 3 décembre 2015 11:01
À : [hidden email]
Objet : RE: HA Mode and standalone containers compatibility ?

De : Till Rohrmann [[hidden email]]
Envoyé : jeudi 3 décembre 2015 10:36
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

Hi Arnaud,

Cheers,

Till

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

Am I missing something ?

Best regards,

Aranud

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

In reply to this post by Ufuk Celebi

Hi,

The batch job does not need to be HA.
I stopped everything, cleaned the temp files, added -Drecovery.mode=standalone and it seems to work now !
Strange, but good for me for now.

Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:11
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>
> De : Till Rohrmann [mailto:[hidden email]] Envoyé : jeudi 3
> décembre 2015 10:36 À : [hidden email] Objet : Re: HA Mode and
> standalone containers compatibility ?
>
> Hi Arnaud,
>
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>
> Cheers,
> Till
>
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>
>
> For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"
>
>
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper
> -Drecovery.zookeeper.quorum=h1r1en01:2181
> -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem
> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>
>
> Am I missing something ?
>
>
>
> Best regards,
>
> Aranud
>
>
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

In reply to this post by Ufuk Celebi

Oopss... False joy.

In fact, it does start another container, but this container ends immediately because the job is not submitted to that container but to the streaming one.

Log details:

Command =
# JVM_ARGS = -DCluster.Parallelisme=150 -Drecovery.mode=standalone
/usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 5120 -yqu batch1 -ys 4 --class com.bouygtel.kubera.main.segstage.MainGeoSegStage /home/voyager/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT-allinone.jar -j /home/voyager/KBR/GOS/log -c /home/voyager/KBR/GOS/cfg/KBR_GOS_Config.cfg

Log =
Found YARN properties file /tmp/.yarn-properties-voyager
YARN properties set default parallelism to 24
Using JobManager address from YARN properties bt1shli3.bpa.bouyguestelecom.fr/172.21.125.28:36700
YARN cluster mode detected. Switching Log4j output to console
11:39:18,192 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://h1r1dn02.bpa.bouyguestelecom.fr:8188/ws/v1/timeline/
11:39:18,349 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at h1r1nn01.bpa.bouyguestelecom.fr/172.21.125.3:8050
11:39:18,504 INFO org.apache.flink.client.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.FlinkYarnClient to locate the jar
11:39:18,513 INFO org.apache.flink.yarn.FlinkYarnClient - Using values:
11:39:18,515 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager count = 48
11:39:18,515 INFO org.apache.flink.yarn.FlinkYarnClient - JobManager memory = 1024
11:39:18,515 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager memory = 5120
11:39:18,641 WARN org.apache.flink.yarn.FlinkYarnClient - The JobManager or TaskManager memory is below the smallest possible YARN Container size. The value of 'yarn.scheduler.minimum-allocation-mb' is '2048'. Please increase the memory size.YARN will allocate the smaller containers but the scheduler will account for the minimum-allocation-mb, maybe not all instances you requested will start.
11:39:19,102 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/lib/flink/lib/flink-dist_2.11-0.10.0.jar to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-dist_2.11-0.10.0.jar
11:39:19,653 INFO org.apache.flink.yarn.Utils - Copying from /usr/lib/flink/conf/flink-conf.yaml to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-conf.yaml
11:39:19,667 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/lib/flink/conf/logback.xml to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/logback.xml
11:39:19,679 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/lib/flink/conf/log4j.properties to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/log4j.properties
11:39:19,698 INFO org.apache.flink.yarn.FlinkYarnClient - Submitting application master application_1449127732314_0046
11:39:19,723 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1449127732314_0046
11:39:19,723 INFO org.apache.flink.yarn.FlinkYarnClient - Waiting for the cluster to be allocated
11:39:19,725 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED
11:39:20,727 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED
11:39:21,728 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED
11:39:22,730 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED
11:39:23,731 INFO org.apache.flink.yarn.FlinkYarnClient - YARN application has been deployed successfully.
11:39:23,734 INFO org.apache.flink.yarn.FlinkYarnCluster - Start actor system.
11:39:24,192 INFO org.apache.flink.yarn.FlinkYarnCluster - Start application client.
YARN cluster started
JobManager web interface address http://h1r1nn01.bpa.bouyguestelecom.fr:8088/proxy/application_1449127732314_0046/
Waiting until all TaskManagers have connected
11:39:24,202 INFO org.apache.flink.yarn.ApplicationClient - Notification about new leader address akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
No status updates from the YARN cluster received so far. Waiting ...
11:39:24,206 INFO org.apache.flink.yarn.ApplicationClient - Received address of new leader akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
11:39:24,206 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager null.
11:39:24,210 INFO org.apache.flink.yarn.ApplicationClient - Trying to register at JobManager akka.tcp://flink@172.21.125.16:59907/user/jobmanager.
11:39:24,377 INFO org.apache.flink.yarn.ApplicationClient - Successfully registered at the JobManager Actor[akka.tcp://flink@172.21.125.16:59907/user/jobmanager#-801507205]
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
All TaskManagers are connected
Using the parallelism provided by the remote cluster (192). To use another parallelism, set it at the ./bin/flink client.
12/03/2015 11:39:55 Job execution switched to status RUNNING.
12/03/2015 11:39:55 CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to SCHEDULED
12/03/2015 11:39:55 CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to DEPLOYING
=> The job starts

Then it crashes :

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1) (5/150)) @ (unassigned) - [SCHEDULED] > with groupID < 7b9e554a93d3ea946d13d239a99bb6ae > in sharing group < SlotSharingGroup [0c9285747d113d8dd85962602b674497, 9f30db9a30430385e1cd9d0f5010ed9e, 36b825566212059be3f888e3bbdf0d96, f95ba68c3916346efe497b937393eb49, e73522cce11e699022c285180fd1024d, 988b776310ef3d8a2a3875227008a30e, 7b9e554a93d3ea946d13d239a99bb6ae, 08af3a01b9cb49b76e6aedcd57d57788, 3f91660c6ab25f0f77d8e55d54397b01] >. Resources available to scheduler: Number of instances=6, total number of slots=24, available slots=0

Stating that I have only 24 slots on my 48 container cluster !

-----Message d'origine-----
De : LINZ, Arnaud
Envoyé : jeudi 3 décembre 2015 11:26
À : [hidden email]
Objet : RE: HA Mode and standalone containers compatibility ?

Hi,

The batch job does not need to be HA.
I stopped everything, cleaned the temp files, added -Drecovery.mode=standalone and it seems to work now !
Strange, but good for me for now.

Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]] Envoyé : jeudi 3 décembre 2015 11:11 À : [hidden email] Objet : Re: HA Mode and standalone containers compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>
> De : Till Rohrmann [mailto:[hidden email]] Envoyé : jeudi 3
> décembre 2015 10:36 À : [hidden email] Objet : Re: HA Mode and
> standalone containers compatibility ?
>
> Hi Arnaud,
>
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>
> Cheers,
> Till
>
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>
>
> For that, I submit streaming jobs with "flink run" and batch jobs with "flink run -m yarn-cluster"
>
>
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper
> -Drecovery.zookeeper.quorum=h1r1en01:2181
> -Drecovery.zookeeper.path.root=/flink -Dstate.backend=filesystem
> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>
>
> Am I missing something ?
>
>
>
> Best regards,
>
> Aranud
>
>
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Ufuk Celebi

Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

Hi,
It works fine with that file renamed. Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Ufuk Celebi

Re: HA Mode and standalone containers compatibility ?

I opened an issue for it and it will fixed with the next 0.10.2 release.

@Robert: are you aware of another workaround for the time being?

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:

Hi,
It works fine with that file renamed. Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

rmetzger0

Re: HA Mode and standalone containers compatibility ?

There is a configuration parameter called "yarn.properties-file.location" which allows setting a custom path for the properties file.

If the batch and streaming jobs are using different configuration files, it should work.

On Thu, Dec 3, 2015 at 1:51 PM, Ufuk Celebi <[hidden email]> wrote:

I opened an issue for it and it will fixed with the next 0.10.2 release.

@Robert: are you aware of another workaround for the time being?

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:
Hi,
It works fine with that file renamed. Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

LINZ, Arnaud

RE: HA Mode and standalone containers compatibility ?

Hi,

I’ve tried to put that parameter in the JVM_ARGS, but not with much success.

# JVM_ARGS : -DCluster.Parallelisme=150 -Drecovery.mode=standalone -Dyarn.properties-file.location=/tmp/flink/batch

(…)

2015:12:03 15:25:42 (ThrdExtrn) - INFO - (...)jobs.exec.ExecutionProcess$1.run - > Found YARN properties file /tmp/.yarn-properties-voyager

Arnaud

De : Robert Metzger [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 14:03
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

There is a configuration parameter called "yarn.properties-file.location" which allows setting a custom path for the properties file.

If the batch and streaming jobs are using different configuration files, it should work.

On Thu, Dec 3, 2015 at 1:51 PM, Ufuk Celebi <[hidden email]> wrote:

I opened an issue for it and it will fixed with the next 0.10.2 release.

@Robert: are you aware of another workaround for the time being?

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:

Hi,
It works fine with that file renamed. Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.