Hi guys,
i've run into some problems with flink/yarn. I try to deploy flink to our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but the yarn application does not even start, it goes from accepted to finished/failed. Yarn info on resourcemanager looks like this: User: wa-flink Name: Flink session with 3 TaskManagers Ap plication Type: Apache Flink Application Tags: State: FINISHED FinalStatus: FAILED Started: Mon Aug 15 18:02:42 +0200 2016 Elapsed: 16sec Tracking URL: History Diagnostics: Fatal error in AM: The ContainerLaunchContext was not set. On dev cluster, applications deploys without problem, this happens only in production. What could be wrong? Thanks, -- Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz [hidden email] |
This could be a bug in Flink. Can you share the complete logs of the
run? CC'ing Max who worked on the YARN client recently who might have an idea in which cases Flink would not set the context. On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš <[hidden email]> wrote: > Hi guys, > > i've run into some problems with flink/yarn. I try to deploy flink to > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but the > yarn application does not even start, it goes from accepted to > finished/failed. Yarn info on resourcemanager looks like this: > > User: wa-flink > Name: Flink session with 3 TaskManagers > Ap > plication Type: Apache Flink > Application Tags: > State: FINISHED > FinalStatus: FAILED > Started: Mon Aug 15 18:02:42 +0200 2016 > Elapsed: 16sec > Tracking URL: History > Diagnostics: Fatal error in AM: The ContainerLaunchContext was > not set. > > On dev cluster, applications deploys without problem, this happens only > in production. > > What could be wrong? > > > Thanks, > > -- > Miroslav Gajdoš > vývoj - webová analytika (Brno) > https://reporter.seznam.cz > [hidden email] > > |
Log from yarn session runner is here:
http://pastebin.com/xW1W4HNP Our hadoop distribution is from cloudera, resourcenanager version: 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting on accessing resourcemanager and/or namenode to active one). Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: > This could be a bug in Flink. Can you share the complete logs of the > run? CC'ing Max who worked on the YARN client recently who might have > an idea in which cases Flink would not set the context. > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš > <[hidden email]> wrote: > > > > Hi guys, > > > > i've run into some problems with flink/yarn. I try to deploy flink > > to > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but > > the > > yarn application does not even start, it goes from accepted to > > finished/failed. Yarn info on resourcemanager looks like this: > > > > User: wa-flink > > Name: Flink session with 3 TaskManagers > > Ap > > plication Type: Apache Flink > > Application Tags: > > State: FINISHED > > FinalStatus: FAILED > > Started: Mon Aug 15 18:02:42 +0200 2016 > > Elapsed: 16sec > > Tracking URL: History > > Diagnostics: Fatal error in AM: The ContainerLaunchContext was > > not set. > > > > On dev cluster, applications deploys without problem, this happens > > only > > in production. > > > > What could be wrong? > > > > > > Thanks, > > > > -- > > Miroslav Gajdoš > > vývoj - webová analytika (Brno) > > https://reporter.seznam.cz > > [hidden email] > > > > Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz [hidden email] |
Hi Miroslav,
From the logs it looks like you're using Flink version 1.0.x. The ContainerLaunchContext is always set by Flink. I'm wondering why this error can still occur. Are you using the default Hadoop version that comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of Flink. Does your Dev cluster share the Zookeeper installation with the production cluster? I'm wondering because it receives incorrect leadership information although the leading JobManager seems to be attempting to register at the ApplicationMaster. Best, Max On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš <[hidden email]> wrote: > Log from yarn session runner is here: > http://pastebin.com/xW1W4HNP > > Our hadoop distribution is from cloudera, resourcenanager version: > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting on > accessing resourcemanager and/or namenode to active one). > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: >> This could be a bug in Flink. Can you share the complete logs of the >> run? CC'ing Max who worked on the YARN client recently who might have >> an idea in which cases Flink would not set the context. >> >> On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš >> <[hidden email]> wrote: >> > >> > Hi guys, >> > >> > i've run into some problems with flink/yarn. I try to deploy flink >> > to >> > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but >> > the >> > yarn application does not even start, it goes from accepted to >> > finished/failed. Yarn info on resourcemanager looks like this: >> > >> > User: wa-flink >> > Name: Flink session with 3 TaskManagers >> > Ap >> > plication Type: Apache Flink >> > Application Tags: >> > State: FINISHED >> > FinalStatus: FAILED >> > Started: Mon Aug 15 18:02:42 +0200 2016 >> > Elapsed: 16sec >> > Tracking URL: History >> > Diagnostics: Fatal error in AM: The ContainerLaunchContext was >> > not set. >> > >> > On dev cluster, applications deploys without problem, this happens >> > only >> > in production. >> > >> > What could be wrong? >> > >> > >> > Thanks, >> > >> > -- >> > Miroslav Gajdoš >> > vývoj - webová analytika (Brno) >> > https://reporter.seznam.cz >> > [hidden email] >> > >> > > -- > Miroslav Gajdoš > vývoj - webová analytika (Brno) > https://reporter.seznam.cz > [hidden email] |
Hi Max,
we are building it from sources and package it for debian. I can try to use the binary release for hadoop 2.6.0. Regarding zookeeper, we do not share instances between dev and production. Thanks, Miroslav Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: > Hi Miroslav, > > From the logs it looks like you're using Flink version 1.0.x. The > ContainerLaunchContext is always set by Flink. I'm wondering why this > error can still occur. Are you using the default Hadoop version that > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of > Flink. > > Does your Dev cluster share the Zookeeper installation with the > production cluster? I'm wondering because it receives incorrect > leadership information although the leading JobManager seems to be > attempting to register at the ApplicationMaster. > > Best, > Max > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš > <[hidden email]> wrote: > > > > Log from yarn session runner is here: > > http://pastebin.com/xW1W4HNP > > > > Our hadoop distribution is from cloudera, resourcenanager version: > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting > > on > > accessing resourcemanager and/or namenode to active one). > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: > > > > > > This could be a bug in Flink. Can you share the complete logs of > > > the > > > run? CC'ing Max who worked on the YARN client recently who might > > > have > > > an idea in which cases Flink would not set the context. > > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš > > > <[hidden email]> wrote: > > > > > > > > > > > > Hi guys, > > > > > > > > i've run into some problems with flink/yarn. I try to deploy > > > > flink > > > > to > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, > > > > but > > > > the > > > > yarn application does not even start, it goes from accepted to > > > > finished/failed. Yarn info on resourcemanager looks like this: > > > > > > > > User: wa-flink > > > > Name: Flink session with 3 TaskManagers > > > > Ap > > > > plication Type: Apache Flink > > > > Application Tags: > > > > State: FINISHED > > > > FinalStatus: FAILED > > > > Started: Mon Aug 15 18:02:42 +0200 2016 > > > > Elapsed: 16sec > > > > Tracking URL: History > > > > Diagnostics: Fatal error in AM: The ContainerLaunchContext > > > > was > > > > not set. > > > > > > > > On dev cluster, applications deploys without problem, this > > > > happens > > > > only > > > > in production. > > > > > > > > What could be wrong? > > > > > > > > > > > > Thanks, > > > > > > > > -- > > > > Miroslav Gajdoš > > > > vývoj - webová analytika (Brno) > > > > https://reporter.seznam.cz > > > > [hidden email] > > > > > > > > > > -- > > Miroslav Gajdoš > > vývoj - webová analytika (Brno) > > https://reporter.seznam.cz > > [hidden email] Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz [hidden email] |
Tried to build it from source as well as use prebuilt binary release
(v1.1.1), the last one produced this log output: http://pastebin.com/3L5Yhs9x Application in yarn still fails on "Fatal error in AM: The ContainerLaunchContext was not set". Mira Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200: > Hi Max, > > we are building it from sources and package it for debian. I can try > to > use the binary release for hadoop 2.6.0. > > Regarding zookeeper, we do not share instances between dev and > production. > > Thanks, > Miroslav > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: > > > > Hi Miroslav, > > > > From the logs it looks like you're using Flink version 1.0.x. The > > ContainerLaunchContext is always set by Flink. I'm wondering why > > this > > error can still occur. Are you using the default Hadoop version > > that > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of > > Flink. > > > > Does your Dev cluster share the Zookeeper installation with the > > production cluster? I'm wondering because it receives incorrect > > leadership information although the leading JobManager seems to be > > attempting to register at the ApplicationMaster. > > > > Best, > > Max > > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš > > <[hidden email]> wrote: > > > > > > > > > Log from yarn session runner is here: > > > http://pastebin.com/xW1W4HNP > > > > > > Our hadoop distribution is from cloudera, resourcenanager > > > version: > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some > > > redirecting > > > on > > > accessing resourcemanager and/or namenode to active one). > > > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: > > > > > > > > > > > > This could be a bug in Flink. Can you share the complete logs > > > > of > > > > the > > > > run? CC'ing Max who worked on the YARN client recently who > > > > might > > > > have > > > > an idea in which cases Flink would not set the context. > > > > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš > > > > <[hidden email]> wrote: > > > > > > > > > > > > > > > > > > > > Hi guys, > > > > > > > > > > i've run into some problems with flink/yarn. I try to deploy > > > > > flink > > > > > to > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn- > > > > > session.sh, > > > > > but > > > > > the > > > > > yarn application does not even start, it goes from accepted > > > > > to > > > > > finished/failed. Yarn info on resourcemanager looks like > > > > > this: > > > > > > > > > > User: wa-flink > > > > > Name: Flink session with 3 TaskManagers > > > > > Ap > > > > > plication Type: Apache Flink > > > > > Application Tags: > > > > > State: FINISHED > > > > > FinalStatus: FAILED > > > > > Started: Mon Aug 15 18:02:42 +0200 2016 > > > > > Elapsed: 16sec > > > > > Tracking URL: History > > > > > Diagnostics: Fatal error in AM: The ContainerLaunchContext > > > > > was > > > > > not set. > > > > > > > > > > On dev cluster, applications deploys without problem, this > > > > > happens > > > > > only > > > > > in production. > > > > > > > > > > What could be wrong? > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > -- > > > > > Miroslav Gajdoš > > > > > vývoj - webová analytika (Brno) > > > > > https://reporter.seznam.cz > > > > > [hidden email] > > > > > > > > > > > > > -- > > > Miroslav Gajdoš > > > vývoj - webová analytika (Brno) > > > https://reporter.seznam.cz > > > [hidden email] Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz [hidden email] |
Hi Mira,
If I understood correctly, the log output should be for Flink 1.1.1. However, there are classes present in the log which don't exist in Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you posted the correct log? Also, it would be good to have not only the client log but also the log of the Flink Yarn application. Thanks, Max On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš <[hidden email]> wrote: > Tried to build it from source as well as use prebuilt binary release > (v1.1.1), the last one produced this log output: > http://pastebin.com/3L5Yhs9x > > Application in yarn still fails on "Fatal error in AM: The > ContainerLaunchContext was not set". > > Mira > > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200: >> Hi Max, >> >> we are building it from sources and package it for debian. I can try >> to >> use the binary release for hadoop 2.6.0. >> >> Regarding zookeeper, we do not share instances between dev and >> production. >> >> Thanks, >> Miroslav >> >> Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: >> > >> > Hi Miroslav, >> > >> > From the logs it looks like you're using Flink version 1.0.x. The >> > ContainerLaunchContext is always set by Flink. I'm wondering why >> > this >> > error can still occur. Are you using the default Hadoop version >> > that >> > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of >> > Flink. >> > >> > Does your Dev cluster share the Zookeeper installation with the >> > production cluster? I'm wondering because it receives incorrect >> > leadership information although the leading JobManager seems to be >> > attempting to register at the ApplicationMaster. >> > >> > Best, >> > Max >> > >> > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš >> > <[hidden email]> wrote: >> > > >> > > >> > > Log from yarn session runner is here: >> > > http://pastebin.com/xW1W4HNP >> > > >> > > Our hadoop distribution is from cloudera, resourcenanager >> > > version: >> > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some >> > > redirecting >> > > on >> > > accessing resourcemanager and/or namenode to active one). >> > > >> > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: >> > > > >> > > > >> > > > This could be a bug in Flink. Can you share the complete logs >> > > > of >> > > > the >> > > > run? CC'ing Max who worked on the YARN client recently who >> > > > might >> > > > have >> > > > an idea in which cases Flink would not set the context. >> > > > >> > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš >> > > > <[hidden email]> wrote: >> > > > > >> > > > > >> > > > > >> > > > > Hi guys, >> > > > > >> > > > > i've run into some problems with flink/yarn. I try to deploy >> > > > > flink >> > > > > to >> > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn- >> > > > > session.sh, >> > > > > but >> > > > > the >> > > > > yarn application does not even start, it goes from accepted >> > > > > to >> > > > > finished/failed. Yarn info on resourcemanager looks like >> > > > > this: >> > > > > >> > > > > User: wa-flink >> > > > > Name: Flink session with 3 TaskManagers >> > > > > Ap >> > > > > plication Type: Apache Flink >> > > > > Application Tags: >> > > > > State: FINISHED >> > > > > FinalStatus: FAILED >> > > > > Started: Mon Aug 15 18:02:42 +0200 2016 >> > > > > Elapsed: 16sec >> > > > > Tracking URL: History >> > > > > Diagnostics: Fatal error in AM: The ContainerLaunchContext >> > > > > was >> > > > > not set. >> > > > > >> > > > > On dev cluster, applications deploys without problem, this >> > > > > happens >> > > > > only >> > > > > in production. >> > > > > >> > > > > What could be wrong? >> > > > > >> > > > > >> > > > > Thanks, >> > > > > >> > > > > -- >> > > > > Miroslav Gajdoš >> > > > > vývoj - webová analytika (Brno) >> > > > > https://reporter.seznam.cz >> > > > > [hidden email] >> > > > > >> > > > > >> > > -- >> > > Miroslav Gajdoš >> > > vývoj - webová analytika (Brno) >> > > https://reporter.seznam.cz >> > > [hidden email] > -- > Miroslav Gajdoš > vývoj - webová analytika (Brno) > https://reporter.seznam.cz > [hidden email] |
Here is the log from yarn application - run on another cluster (this
time cdh5.7.0, but with similar configuration). Check the hostnames; in configuration, there are aliases used and the difference from fqdn may be the cause, judging by the log (exception at line 87)... http://pastebin.com/iimPVbXB Thanks, Mira Maximilian Michels píše v Pá 19. 08. 2016 v 09:12 +0200: > Hi Mira, > > If I understood correctly, the log output should be for Flink 1.1.1. > However, there are classes present in the log which don't exist in > Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you > posted the correct log? > > Also, it would be good to have not only the client log but also the > log of the Flink Yarn application. > > Thanks, > Max > > On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš > <[hidden email]> wrote: > > > > Tried to build it from source as well as use prebuilt binary > > release > > (v1.1.1), the last one produced this log output: > > http://pastebin.com/3L5Yhs9x > > > > Application in yarn still fails on "Fatal error in AM: The > > ContainerLaunchContext was not set". > > > > Mira > > > > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200: > > > > > > Hi Max, > > > > > > we are building it from sources and package it for debian. I can > > > try > > > to > > > use the binary release for hadoop 2.6.0. > > > > > > Regarding zookeeper, we do not share instances between dev and > > > production. > > > > > > Thanks, > > > Miroslav > > > > > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: > > > > > > > > > > > > Hi Miroslav, > > > > > > > > From the logs it looks like you're using Flink version 1.0.x. > > > > The > > > > ContainerLaunchContext is always set by Flink. I'm wondering > > > > why > > > > this > > > > error can still occur. Are you using the default Hadoop version > > > > that > > > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build > > > > of > > > > Flink. > > > > > > > > Does your Dev cluster share the Zookeeper installation with the > > > > production cluster? I'm wondering because it receives incorrect > > > > leadership information although the leading JobManager seems to > > > > be > > > > attempting to register at the ApplicationMaster. > > > > > > > > Best, > > > > Max > > > > > > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš > > > > <[hidden email]> wrote: > > > > > > > > > > > > > > > > > > > > Log from yarn session runner is here: > > > > > http://pastebin.com/xW1W4HNP > > > > > > > > > > Our hadoop distribution is from cloudera, resourcenanager > > > > > version: > > > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some > > > > > redirecting > > > > > on > > > > > accessing resourcemanager and/or namenode to active one). > > > > > > > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: > > > > > > > > > > > > > > > > > > > > > > > > This could be a bug in Flink. Can you share the complete > > > > > > logs > > > > > > of > > > > > > the > > > > > > run? CC'ing Max who worked on the YARN client recently who > > > > > > might > > > > > > have > > > > > > an idea in which cases Flink would not set the context. > > > > > > > > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš > > > > > > <[hidden email]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi guys, > > > > > > > > > > > > > > i've run into some problems with flink/yarn. I try to > > > > > > > deploy > > > > > > > flink > > > > > > > to > > > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn- > > > > > > > session.sh, > > > > > > > but > > > > > > > the > > > > > > > yarn application does not even start, it goes from > > > > > > > accepted > > > > > > > to > > > > > > > finished/failed. Yarn info on resourcemanager looks like > > > > > > > this: > > > > > > > > > > > > > > User: wa-flink > > > > > > > Name: Flink session with 3 TaskManagers > > > > > > > Ap > > > > > > > plication Type: Apache Flink > > > > > > > Application Tags: > > > > > > > State: FINISHED > > > > > > > FinalStatus: FAILED > > > > > > > Started: Mon Aug 15 18:02:42 +0200 2016 > > > > > > > Elapsed: 16sec > > > > > > > Tracking URL: History > > > > > > > Diagnostics: Fatal error in AM: The > > > > > > > ContainerLaunchContext > > > > > > > was > > > > > > > not set. > > > > > > > > > > > > > > On dev cluster, applications deploys without problem, > > > > > > > this > > > > > > > happens > > > > > > > only > > > > > > > in production. > > > > > > > > > > > > > > What could be wrong? > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > -- > > > > > > > Miroslav Gajdoš > > > > > > > vývoj - webová analytika (Brno) > > > > > > > https://reporter.seznam.cz > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > > -- > > > > > Miroslav Gajdoš > > > > > vývoj - webová analytika (Brno) > > > > > https://reporter.seznam.cz > > > > > [hidden email] > > -- > > Miroslav Gajdoš > > vývoj - webová analytika (Brno) > > https://reporter.seznam.cz > > [hidden email] Miroslav Gajdoš vývoj - webová analytika (Brno) https://reporter.seznam.cz [hidden email] |
Hi Mira,
Does using the fully-qualified hostname solve the issue? Thanks, Max On Mon, Aug 22, 2016 at 1:38 PM, Miroslav Gajdoš <[hidden email]> wrote: > Here is the log from yarn application - run on another cluster (this > time cdh5.7.0, but with similar configuration). Check the hostnames; in > configuration, there are aliases used and the difference from fqdn may > be the cause, judging by the log (exception at line 87)... > > http://pastebin.com/iimPVbXB > > Thanks, > Mira > > > > Maximilian Michels píše v Pá 19. 08. 2016 v 09:12 +0200: >> Hi Mira, >> >> If I understood correctly, the log output should be for Flink 1.1.1. >> However, there are classes present in the log which don't exist in >> Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you >> posted the correct log? >> >> Also, it would be good to have not only the client log but also the >> log of the Flink Yarn application. >> >> Thanks, >> Max >> >> On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš >> <[hidden email]> wrote: >> > >> > Tried to build it from source as well as use prebuilt binary >> > release >> > (v1.1.1), the last one produced this log output: >> > http://pastebin.com/3L5Yhs9x >> > >> > Application in yarn still fails on "Fatal error in AM: The >> > ContainerLaunchContext was not set". >> > >> > Mira >> > >> > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200: >> > > >> > > Hi Max, >> > > >> > > we are building it from sources and package it for debian. I can >> > > try >> > > to >> > > use the binary release for hadoop 2.6.0. >> > > >> > > Regarding zookeeper, we do not share instances between dev and >> > > production. >> > > >> > > Thanks, >> > > Miroslav >> > > >> > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200: >> > > > >> > > > >> > > > Hi Miroslav, >> > > > >> > > > From the logs it looks like you're using Flink version 1.0.x. >> > > > The >> > > > ContainerLaunchContext is always set by Flink. I'm wondering >> > > > why >> > > > this >> > > > error can still occur. Are you using the default Hadoop version >> > > > that >> > > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build >> > > > of >> > > > Flink. >> > > > >> > > > Does your Dev cluster share the Zookeeper installation with the >> > > > production cluster? I'm wondering because it receives incorrect >> > > > leadership information although the leading JobManager seems to >> > > > be >> > > > attempting to register at the ApplicationMaster. >> > > > >> > > > Best, >> > > > Max >> > > > >> > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš >> > > > <[hidden email]> wrote: >> > > > > >> > > > > >> > > > > >> > > > > Log from yarn session runner is here: >> > > > > http://pastebin.com/xW1W4HNP >> > > > > >> > > > > Our hadoop distribution is from cloudera, resourcenanager >> > > > > version: >> > > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some >> > > > > redirecting >> > > > > on >> > > > > accessing resourcemanager and/or namenode to active one). >> > > > > >> > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200: >> > > > > > >> > > > > > >> > > > > > >> > > > > > This could be a bug in Flink. Can you share the complete >> > > > > > logs >> > > > > > of >> > > > > > the >> > > > > > run? CC'ing Max who worked on the YARN client recently who >> > > > > > might >> > > > > > have >> > > > > > an idea in which cases Flink would not set the context. >> > > > > > >> > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš >> > > > > > <[hidden email]> wrote: >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Hi guys, >> > > > > > > >> > > > > > > i've run into some problems with flink/yarn. I try to >> > > > > > > deploy >> > > > > > > flink >> > > > > > > to >> > > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn- >> > > > > > > session.sh, >> > > > > > > but >> > > > > > > the >> > > > > > > yarn application does not even start, it goes from >> > > > > > > accepted >> > > > > > > to >> > > > > > > finished/failed. Yarn info on resourcemanager looks like >> > > > > > > this: >> > > > > > > >> > > > > > > User: wa-flink >> > > > > > > Name: Flink session with 3 TaskManagers >> > > > > > > Ap >> > > > > > > plication Type: Apache Flink >> > > > > > > Application Tags: >> > > > > > > State: FINISHED >> > > > > > > FinalStatus: FAILED >> > > > > > > Started: Mon Aug 15 18:02:42 +0200 2016 >> > > > > > > Elapsed: 16sec >> > > > > > > Tracking URL: History >> > > > > > > Diagnostics: Fatal error in AM: The >> > > > > > > ContainerLaunchContext >> > > > > > > was >> > > > > > > not set. >> > > > > > > >> > > > > > > On dev cluster, applications deploys without problem, >> > > > > > > this >> > > > > > > happens >> > > > > > > only >> > > > > > > in production. >> > > > > > > >> > > > > > > What could be wrong? >> > > > > > > >> > > > > > > >> > > > > > > Thanks, >> > > > > > > >> > > > > > > -- >> > > > > > > Miroslav Gajdoš >> > > > > > > vývoj - webová analytika (Brno) >> > > > > > > https://reporter.seznam.cz >> > > > > > > [hidden email] >> > > > > > > >> > > > > > > >> > > > > -- >> > > > > Miroslav Gajdoš >> > > > > vývoj - webová analytika (Brno) >> > > > > https://reporter.seznam.cz >> > > > > [hidden email] >> > -- >> > Miroslav Gajdoš >> > vývoj - webová analytika (Brno) >> > https://reporter.seznam.cz >> > [hidden email] > -- > Miroslav Gajdoš > vývoj - webová analytika (Brno) > https://reporter.seznam.cz > [hidden email] |
Free forum by Nabble | Edit this page |