HA Mode and standalone containers compatibility ?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

HA Mode and standalone containers compatibility ?

LINZ, Arnaud

Hello,

 

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

 

For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"

 

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

 

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

 

Am I missing something ?

 

Best regards,

Aranud




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Reply | Threaded
Open this post in threaded view
|

Re: HA Mode and standalone containers compatibility ?

Till Rohrmann
Hi Arnaud,

as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?

Cheers,
Till

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

 

For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"

 

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

 

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

 

Am I missing something ?

 

Best regards,

Aranud




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud

Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.

 

De : Till Rohrmann [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 10:36
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

 

Hi Arnaud,

 

as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?

 

Cheers,

Till

 

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

 

For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"

 

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

 

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

 

Am I missing something ?

 

Best regards,

Aranud

 



L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

 

Reply | Threaded
Open this post in threaded view
|

Re: HA Mode and standalone containers compatibility ?

Ufuk Celebi
Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>  
> De : Till Rohrmann [mailto:[hidden email]]
> Envoyé : jeudi 3 décembre 2015 10:36
> À : [hidden email]
> Objet : Re: HA Mode and standalone containers compatibility ?
>  
> Hi Arnaud,
>  
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>  
> Cheers,
> Till
>  
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>  
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>  
>
> For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"
>
>  
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>  
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>  
>
> Am I missing something ?
>
>  
>
> Best regards,
>
> Aranud
>
>  
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud
In reply to this post by Till Rohrmann

More details :

 

Command =

/usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 5120 -yqu batch1 -ys 4 --class com.bouygtel.kubera.main.segstage.MainGeoSegStage /home/voyager/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT-allinone.jar  -j /home/voyager/KBR/GOS/log -c /home/voyager/KBR/GOS/cfg/KBR_GOS_Config.cfg

 

 

Start of trace is :

Found YARN properties file /tmp/.yarn-properties-voyager

YARN properties set default parallelism to 24

Using JobManager address from YARN properties bt1shli3.bpa.bouyguestelecom.fr/172.21.125.28:36700

YARN cluster mode detected. Switching Log4j output to console

 

 

Content of /tmp/.yarn-properties-voyager

Is related to the streaming cluster :

 

#Generated YARN properties file

#Thu Dec 03 11:03:06 CET 2015

parallelism=24

dynamicPropertiesString=yarn.heap-cutoff-ratio\=0.6@@yarn.application-attempts\=10@@recovery.mode\=zookeeper@@recovery.zookeeper.quorum\=h1r1en01\:2181@@recovery.zookeeper.path.root\=/flink@@state.backend\=filesystem@@state.backend.fs.checkpointdir\=hdfs\:///tmp/flink/checkpoints@@recovery.zookeeper.storageDir\=hdfs\:///tmp/flink/recovery/

jobManager=172.21.125.28\:36700

 

 

 

 

De : LINZ, Arnaud
Envoyé : jeudi 3 décembre 2015 11:01
À : [hidden email]
Objet : RE: HA Mode and standalone containers compatibility ?

 

Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.

 

De : Till Rohrmann [[hidden email]]
Envoyé : jeudi 3 décembre 2015 10:36
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

 

Hi Arnaud,

 

as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?

 

Cheers,

Till

 

On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.

 

For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"

 

This was working fine until I turned zookeeper HA mode on for my streaming applications.

Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.

 

My HA options are :

-Dyarn.application-attempts=10 -Drecovery.mode=zookeeper -Drecovery.zookeeper.quorum=h1r1en01:2181 -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/

 

Am I missing something ?

 

Best regards,

Aranud

 



L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

 

Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud
In reply to this post by Ufuk Celebi
Hi,

The batch job does not need to be HA.
I stopped everything, cleaned the temp files, added -Drecovery.mode=standalone and it seems to work now !
Strange, but good for me for now.

Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:11
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>  
> De : Till Rohrmann [mailto:[hidden email]] Envoyé : jeudi 3
> décembre 2015 10:36 À : [hidden email] Objet : Re: HA Mode and
> standalone containers compatibility ?
>  
> Hi Arnaud,
>  
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>  
> Cheers,
> Till
>  
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>  
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>  
>
> For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"
>
>  
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>  
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper
> -Drecovery.zookeeper.quorum=h1r1en01:2181
> -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem
> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>  
>
> Am I missing something ?
>
>  
>
> Best regards,
>
> Aranud
>
>  
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud
In reply to this post by Ufuk Celebi
Oopss... False joy.

In fact, it does start another container, but this container ends immediately because the job is not submitted to that container but to the streaming one.

Log details:

Command =
#  JVM_ARGS =  -DCluster.Parallelisme=150  -Drecovery.mode=standalone
/usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 5120 -yqu batch1 -ys 4 --class com.bouygtel.kubera.main.segstage.MainGeoSegStage /home/voyager/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT-allinone.jar  -j /home/voyager/KBR/GOS/log -c /home/voyager/KBR/GOS/cfg/KBR_GOS_Config.cfg

Log =
Found YARN properties file /tmp/.yarn-properties-voyager
YARN properties set default parallelism to 24
Using JobManager address from YARN properties bt1shli3.bpa.bouyguestelecom.fr/172.21.125.28:36700
YARN cluster mode detected. Switching Log4j output to console
11:39:18,192 INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl     - Timeline service address: http://h1r1dn02.bpa.bouyguestelecom.fr:8188/ws/v1/timeline/
11:39:18,349 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at h1r1nn01.bpa.bouyguestelecom.fr/172.21.125.3:8050
11:39:18,504 INFO  org.apache.flink.client.FlinkYarnSessionCli                   - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.FlinkYarnClient to locate the jar
11:39:18,513 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Using values:
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                         -   TaskManager count = 48
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                         -   JobManager memory = 1024
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                         -   TaskManager memory = 5120
11:39:18,641 WARN  org.apache.flink.yarn.FlinkYarnClient                         - The JobManager or TaskManager memory is below the smallest possible YARN Container size. The value of 'yarn.scheduler.minimum-allocation-mb' is '2048'. Please increase the memory size.YARN will allocate the smaller containers but the scheduler will account for the minimum-allocation-mb, maybe not all instances you requested will start.
11:39:19,102 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/usr/lib/flink/lib/flink-dist_2.11-0.10.0.jar to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-dist_2.11-0.10.0.jar
11:39:19,653 INFO  org.apache.flink.yarn.Utils                                   - Copying from /usr/lib/flink/conf/flink-conf.yaml to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-conf.yaml
11:39:19,667 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/usr/lib/flink/conf/logback.xml to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/logback.xml
11:39:19,679 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/usr/lib/flink/conf/log4j.properties to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/log4j.properties
11:39:19,698 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Submitting application master application_1449127732314_0046
11:39:19,723 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1449127732314_0046
11:39:19,723 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Waiting for the cluster to be allocated
11:39:19,725 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
11:39:20,727 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
11:39:21,728 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
11:39:22,730 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
11:39:23,731 INFO  org.apache.flink.yarn.FlinkYarnClient                         - YARN application has been deployed successfully.
11:39:23,734 INFO  org.apache.flink.yarn.FlinkYarnCluster                        - Start actor system.
11:39:24,192 INFO  org.apache.flink.yarn.FlinkYarnCluster                        - Start application client.
YARN cluster started
JobManager web interface address http://h1r1nn01.bpa.bouyguestelecom.fr:8088/proxy/application_1449127732314_0046/
Waiting until all TaskManagers have connected
11:39:24,202 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
No status updates from the YARN cluster received so far. Waiting ...
11:39:24,206 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
11:39:24,206 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.
11:39:24,210 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@172.21.125.16:59907/user/jobmanager.
11:39:24,377 INFO  org.apache.flink.yarn.ApplicationClient                       - Successfully registered at the JobManager Actor[akka.tcp://flink@172.21.125.16:59907/user/jobmanager#-801507205]
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
All TaskManagers are connected
Using the parallelism provided by the remote cluster (192). To use another parallelism, set it at the ./bin/flink client.
12/03/2015 11:39:55  Job execution switched to status RUNNING.
12/03/2015 11:39:55  CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to SCHEDULED
12/03/2015 11:39:55  CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to DEPLOYING
=> The job starts

Then it crashes :

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at createInput(ExecutionEnvironment.java:508) (com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1) (5/150)) @ (unassigned) - [SCHEDULED] > with groupID < 7b9e554a93d3ea946d13d239a99bb6ae > in sharing group < SlotSharingGroup [0c9285747d113d8dd85962602b674497, 9f30db9a30430385e1cd9d0f5010ed9e, 36b825566212059be3f888e3bbdf0d96, f95ba68c3916346efe497b937393eb49, e73522cce11e699022c285180fd1024d, 988b776310ef3d8a2a3875227008a30e, 7b9e554a93d3ea946d13d239a99bb6ae, 08af3a01b9cb49b76e6aedcd57d57788, 3f91660c6ab25f0f77d8e55d54397b01] >. Resources available to scheduler: Number of instances=6, total number of slots=24, available slots=0

Stating that I have only 24 slots on my 48 container cluster !




-----Message d'origine-----
De : LINZ, Arnaud
Envoyé : jeudi 3 décembre 2015 11:26
À : [hidden email]
Objet : RE: HA Mode and standalone containers compatibility ?

Hi,

The batch job does not need to be HA.
I stopped everything, cleaned the temp files, added -Drecovery.mode=standalone and it seems to work now !
Strange, but good for me for now.

Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]] Envoyé : jeudi 3 décembre 2015 11:11 À : [hidden email] Objet : Re: HA Mode and standalone containers compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this (checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <[hidden email]> wrote:
>
> Yes, it does interfere, I do have additional task managers. My batch application comes in my streaming cluster Flink’s GUI instead of creating its own container with its own GUI despite the –m yarn-cluster option.
>  
> De : Till Rohrmann [mailto:[hidden email]] Envoyé : jeudi 3
> décembre 2015 10:36 À : [hidden email] Objet : Re: HA Mode and
> standalone containers compatibility ?
>  
> Hi Arnaud,
>  
> as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could you check that actually a second yarn application is started when you run the batch jobs?
>  
> Cheers,
> Till
>  
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <[hidden email]> wrote:
> Hello,
>
>  
>
> I have both streaming applications & batch applications. Since the memory needs are not the same, I was using a long-living container for my streaming apps and new short-lived containers for hosting each batch execution.
>
>  
>
> For that, I submit streaming jobs with "flink run"  and batch jobs with "flink run -m yarn-cluster"
>
>  
>
> This was working fine until I turned zookeeper HA mode on for my streaming applications.
>
> Even if I don't set it up in the yaml flink configuration file, but with -D options on the yarn_session.sh command line, now my batch jobs try to run in the streaming container, and fails because of the lack of ressources.
>
>  
>
> My HA options are :
>
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper
> -Drecovery.zookeeper.quorum=h1r1en01:2181
> -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem
> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
>
>  
>
> Am I missing something ?
>
>  
>
> Best regards,
>
> Aranud
>
>  
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

Re: HA Mode and standalone containers compatibility ?

Ufuk Celebi

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud
Hi,
It works fine with that file renamed.  Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?


> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk


________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Reply | Threaded
Open this post in threaded view
|

Re: HA Mode and standalone containers compatibility ?

Ufuk Celebi
I opened an issue for it and it will fixed with the next 0.10.2 release.

@Robert: are you aware of another workaround for the time being?

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:
Hi,
It works fine with that file renamed.  Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?


> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk


________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

Re: HA Mode and standalone containers compatibility ?

rmetzger0
There is a configuration parameter called "yarn.properties-file.location" which allows setting a custom path for the properties file.
If the batch and streaming jobs are using different configuration files, it should work.

On Thu, Dec 3, 2015 at 1:51 PM, Ufuk Celebi <[hidden email]> wrote:
I opened an issue for it and it will fixed with the next 0.10.2 release.

@Robert: are you aware of another workaround for the time being?

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:
Hi,
It works fine with that file renamed.  Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?


> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk


________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.


Reply | Threaded
Open this post in threaded view
|

RE: HA Mode and standalone containers compatibility ?

LINZ, Arnaud

Hi,

I’ve tried to put that parameter in the JVM_ARGS, but not with much success.

 

# JVM_ARGS :  -DCluster.Parallelisme=150  -Drecovery.mode=standalone -Dyarn.properties-file.location=/tmp/flink/batch

(…)

2015:12:03 15:25:42 (ThrdExtrn) - INFO - (...)jobs.exec.ExecutionProcess$1.run - > Found YARN properties file /tmp/.yarn-properties-voyager

 

Arnaud

 

 

De : Robert Metzger [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 14:03
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

 

There is a configuration parameter called "yarn.properties-file.location" which allows setting a custom path for the properties file.

If the batch and streaming jobs are using different configuration files, it should work.

 

On Thu, Dec 3, 2015 at 1:51 PM, Ufuk Celebi <[hidden email]> wrote:

I opened an issue for it and it will fixed with the next 0.10.2 release.

 

@Robert: are you aware of another workaround for the time being?

 

On Thu, Dec 3, 2015 at 1:20 PM, LINZ, Arnaud <[hidden email]> wrote:

Hi,
It works fine with that file renamed.  Is there a way to specify its path for a specific execution to have a proper workaround ?
Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:[hidden email]]
Envoyé : jeudi 3 décembre 2015 11:53
À : [hidden email]
Objet : Re: HA Mode and standalone containers compatibility ?

> On 03 Dec 2015, at 11:47, LINZ, Arnaud <[hidden email]> wrote:
>
> Oopss... False joy.

OK, I think this is a bug in the YARN Client and the way it uses the .properties files to submit jobs.

As a work around: Can you mv the /tmp/.yarn-properties-voyager file and submit the batch job?

mv /tmp/.yarn-properties-voyager /tmp/.bak.yarn-properties-voyager

– Ufuk

________________________________


L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.