(DEPRECATED) Apache Flink User Mailing List archive.

flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Classic

List

Threaded

9 messages Options

Miroslav Gajdoš

flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Hi guys,

i've run into some problems with flink/yarn. I try to deploy flink to
our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but the
yarn application does not even start, it goes from accepted to
finished/failed. Yarn info on resourcemanager looks like this:

User: wa-flink
Name: Flink session with 3 TaskManagers
Ap
plication Type: Apache Flink
Application Tags:
State: FINISHED
FinalStatus: FAILED
Started: Mon Aug 15 18:02:42 +0200 2016
Elapsed: 16sec
Tracking URL: History
Diagnostics: Fatal error in AM: The ContainerLaunchContext was
not set.

On dev cluster, applications deploys without problem, this happens only
in production.

What could be wrong?

Thanks,

--
Miroslav Gajdoš
vývoj - webová analytika (Brno)
https://reporter.seznam.cz
[hidden email]

Ufuk Celebi

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

This could be a bug in Flink. Can you share the complete logs of the
run? CC'ing Max who worked on the YARN client recently who might have
an idea in which cases Flink would not set the context.

On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
<[hidden email]> wrote:

> Hi guys,
>
> i've run into some problems with flink/yarn. I try to deploy flink to
> our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but the
> yarn application does not even start, it goes from accepted to
> finished/failed. Yarn info on resourcemanager looks like this:
>
> User: wa-flink
> Name: Flink session with 3 TaskManagers
> Ap
> plication Type: Apache Flink
> Application Tags:
> State: FINISHED
> FinalStatus: FAILED
> Started: Mon Aug 15 18:02:42 +0200 2016
> Elapsed: 16sec
> Tracking URL: History
> Diagnostics: Fatal error in AM: The ContainerLaunchContext was
> not set.
>
> On dev cluster, applications deploys without problem, this happens only
> in production.
>
> What could be wrong?
>
>
> Thanks,
>
> --
> Miroslav Gajdoš
> vývoj - webová analytika (Brno)
> https://reporter.seznam.cz
> [hidden email]
>
>

Miroslav Gajdoš

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Log from yarn session runner is here:
http://pastebin.com/xW1W4HNP

Our hadoop distribution is from cloudera, resourcenanager version:
2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting on
accessing resourcemanager and/or namenode to active one).

Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:

> This could be a bug in Flink. Can you share the complete logs of the
> run? CC'ing Max who worked on the YARN client recently who might have
> an idea in which cases Flink would not set the context.
>
> On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
> <[hidden email]> wrote:
> >
> > Hi guys,
> >
> > i've run into some problems with flink/yarn. I try to deploy flink
> > to
> > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but
> > the
> > yarn application does not even start, it goes from accepted to
> > finished/failed. Yarn info on resourcemanager looks like this:
> >
> > User:   wa-flink
> > Name:   Flink session with 3 TaskManagers
> > Ap
> > plication Type:         Apache Flink
> > Application Tags:
> > State:  FINISHED
> > FinalStatus:    FAILED
> > Started:        Mon Aug 15 18:02:42 +0200 2016
> > Elapsed:        16sec
> > Tracking URL:   History
> > Diagnostics:    Fatal error in AM: The ContainerLaunchContext was
> > not set.
> >
> > On dev cluster, applications deploys without problem, this happens
> > only
> > in production.
> >
> > What could be wrong?
> >
> >
> > Thanks,
> >
> > --
> > Miroslav Gajdoš
> > vývoj - webová analytika (Brno)
> > https://reporter.seznam.cz
> > [hidden email]
> >
> >

--
Miroslav Gajdoš
vývoj - webová analytika (Brno)
https://reporter.seznam.cz
[hidden email]

Maximilian Michels

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Hi Miroslav,

From the logs it looks like you're using Flink version 1.0.x. The
ContainerLaunchContext is always set by Flink. I'm wondering why this
error can still occur. Are you using the default Hadoop version that
comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of
Flink.

Does your Dev cluster share the Zookeeper installation with the
production cluster? I'm wondering because it receives incorrect
leadership information although the leading JobManager seems to be
attempting to register at the ApplicationMaster.

Best,
Max

On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
<[hidden email]> wrote:

> Log from yarn session runner is here:
> http://pastebin.com/xW1W4HNP
>
> Our hadoop distribution is from cloudera, resourcenanager version:
> 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting on
> accessing resourcemanager and/or namenode to active one).
>
> Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
>> This could be a bug in Flink. Can you share the complete logs of the
>> run? CC'ing Max who worked on the YARN client recently who might have
>> an idea in which cases Flink would not set the context.
>>
>> On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
>> <[hidden email]> wrote:
>> >
>> > Hi guys,
>> >
>> > i've run into some problems with flink/yarn. I try to deploy flink
>> > to
>> > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh, but
>> > the
>> > yarn application does not even start, it goes from accepted to
>> > finished/failed. Yarn info on resourcemanager looks like this:
>> >
>> > User: wa-flink
>> > Name: Flink session with 3 TaskManagers
>> > Ap
>> > plication Type: Apache Flink
>> > Application Tags:
>> > State: FINISHED
>> > FinalStatus: FAILED
>> > Started: Mon Aug 15 18:02:42 +0200 2016
>> > Elapsed: 16sec
>> > Tracking URL: History
>> > Diagnostics: Fatal error in AM: The ContainerLaunchContext was
>> > not set.
>> >
>> > On dev cluster, applications deploys without problem, this happens
>> > only
>> > in production.
>> >
>> > What could be wrong?
>> >
>> >
>> > Thanks,
>> >
>> > --
>> > Miroslav Gajdoš
>> > vývoj - webová analytika (Brno)
>> > https://reporter.seznam.cz
>> > [hidden email]
>> >
>> >
> --
> Miroslav Gajdoš
> vývoj - webová analytika (Brno)
> https://reporter.seznam.cz
> [hidden email]

Miroslav Gajdoš

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Hi Max,

we are building it from sources and package it for debian. I can try to
use the binary release for hadoop 2.6.0.

Regarding zookeeper, we do not share instances between dev and
production.

Thanks,
Miroslav

Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200:

> Hi Miroslav,
>
> From the logs it looks like you're using Flink version 1.0.x. The
> ContainerLaunchContext is always set by Flink. I'm wondering why this
> error can still occur. Are you using the default Hadoop version that
> comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of
> Flink.
>
> Does your Dev cluster share the Zookeeper installation with the
> production cluster? I'm wondering because it receives incorrect
> leadership information although the leading JobManager seems to be
> attempting to register at the ApplicationMaster.
>
> Best,
> Max
>
> On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
> <[hidden email]> wrote:
> >
> > Log from yarn session runner is here:
> > http://pastebin.com/xW1W4HNP
> >
> > Our hadoop distribution is from cloudera, resourcenanager version:
> > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some redirecting
> > on
> > accessing resourcemanager and/or namenode to active one).
> >
> > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
> > >
> > > This could be a bug in Flink. Can you share the complete logs of
> > > the
> > > run? CC'ing Max who worked on the YARN client recently who might
> > > have
> > > an idea in which cases Flink would not set the context.
> > >
> > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
> > > <[hidden email]> wrote:
> > > >
> > > >
> > > > Hi guys,
> > > >
> > > > i've run into some problems with flink/yarn. I try to deploy
> > > > flink
> > > > to
> > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-session.sh,
> > > > but
> > > > the
> > > > yarn application does not even start, it goes from accepted to
> > > > finished/failed. Yarn info on resourcemanager looks like this:
> > > >
> > > > User:   wa-flink
> > > > Name:   Flink session with 3 TaskManagers
> > > > Ap
> > > > plication Type:         Apache Flink
> > > > Application Tags:
> > > > State:  FINISHED
> > > > FinalStatus:    FAILED
> > > > Started:        Mon Aug 15 18:02:42 +0200 2016
> > > > Elapsed:        16sec
> > > > Tracking URL:   History
> > > > Diagnostics:    Fatal error in AM: The ContainerLaunchContext
> > > > was
> > > > not set.
> > > >
> > > > On dev cluster, applications deploys without problem, this
> > > > happens
> > > > only
> > > > in production.
> > > >
> > > > What could be wrong?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Miroslav Gajdoš
> > > > vývoj - webová analytika (Brno)
> > > > https://reporter.seznam.cz
> > > > [hidden email]
> > > >
> > > >
> > --
> > Miroslav Gajdoš
> > vývoj - webová analytika (Brno)
> > https://reporter.seznam.cz
> > [hidden email]

--
Miroslav Gajdoš
vývoj - webová analytika (Brno)
https://reporter.seznam.cz
[hidden email]

Miroslav Gajdoš

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Tried to build it from source as well as use prebuilt binary release
(v1.1.1), the last one produced this log output:
http://pastebin.com/3L5Yhs9x

Application in yarn still fails on "Fatal error in AM: The
ContainerLaunchContext was not set".

Mira

Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200:

> Hi Max,
>
> we are building it from sources and package it for debian. I can try
> to
> use the binary release for hadoop 2.6.0.
>
> Regarding zookeeper, we do not share instances between dev and
> production.
>
> Thanks,
> Miroslav
>
> Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200:
> >
> > Hi Miroslav,
> >
> > From the logs it looks like you're using Flink version 1.0.x. The
> > ContainerLaunchContext is always set by Flink. I'm wondering why
> > this
> > error can still occur. Are you using the default Hadoop version
> > that
> > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of
> > Flink.
> >
> > Does your Dev cluster share the Zookeeper installation with the
> > production cluster? I'm wondering because it receives incorrect
> > leadership information although the leading JobManager seems to be
> > attempting to register at the ApplicationMaster.
> >
> > Best,
> > Max
> >
> > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
> > <[hidden email]> wrote:
> > >
> > >
> > > Log from yarn session runner is here:
> > > http://pastebin.com/xW1W4HNP
> > >
> > > Our hadoop distribution is from cloudera, resourcenanager
> > > version:
> > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some
> > > redirecting
> > > on
> > > accessing resourcemanager and/or namenode to active one).
> > >
> > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
> > > >
> > > >
> > > > This could be a bug in Flink. Can you share the complete logs
> > > > of
> > > > the
> > > > run? CC'ing Max who worked on the YARN client recently who
> > > > might
> > > > have
> > > > an idea in which cases Flink would not set the context.
> > > >
> > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
> > > > <[hidden email]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > i've run into some problems with flink/yarn. I try to deploy
> > > > > flink
> > > > > to
> > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-
> > > > > session.sh,
> > > > > but
> > > > > the
> > > > > yarn application does not even start, it goes from accepted
> > > > > to
> > > > > finished/failed. Yarn info on resourcemanager looks like
> > > > > this:
> > > > >
> > > > > User:   wa-flink
> > > > > Name:   Flink session with 3 TaskManagers
> > > > > Ap
> > > > > plication Type:         Apache Flink
> > > > > Application Tags:
> > > > > State:  FINISHED
> > > > > FinalStatus:    FAILED
> > > > > Started:        Mon Aug 15 18:02:42 +0200 2016
> > > > > Elapsed:        16sec
> > > > > Tracking URL:   History
> > > > > Diagnostics:    Fatal error in AM: The ContainerLaunchContext
> > > > > was
> > > > > not set.
> > > > >
> > > > > On dev cluster, applications deploys without problem, this
> > > > > happens
> > > > > only
> > > > > in production.
> > > > >
> > > > > What could be wrong?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --
> > > > > Miroslav Gajdoš
> > > > > vývoj - webová analytika (Brno)
> > > > > https://reporter.seznam.cz
> > > > > [hidden email]
> > > > >
> > > > >
> > > --
> > > Miroslav Gajdoš
> > > vývoj - webová analytika (Brno)
> > > https://reporter.seznam.cz
> > > [hidden email]

--
Miroslav Gajdoš
vývoj - webová analytika (Brno)
https://reporter.seznam.cz
[hidden email]

Maximilian Michels

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Hi Mira,

If I understood correctly, the log output should be for Flink 1.1.1.
However, there are classes present in the log which don't exist in
Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you
posted the correct log?

Also, it would be good to have not only the client log but also the
log of the Flink Yarn application.

Thanks,
Max

On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš
<[hidden email]> wrote:

> Tried to build it from source as well as use prebuilt binary release
> (v1.1.1), the last one produced this log output:
> http://pastebin.com/3L5Yhs9x
>
> Application in yarn still fails on "Fatal error in AM: The
> ContainerLaunchContext was not set".
>
> Mira
>
> Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200:
>> Hi Max,
>>
>> we are building it from sources and package it for debian. I can try
>> to
>> use the binary release for hadoop 2.6.0.
>>
>> Regarding zookeeper, we do not share instances between dev and
>> production.
>>
>> Thanks,
>> Miroslav
>>
>> Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200:
>> >
>> > Hi Miroslav,
>> >
>> > From the logs it looks like you're using Flink version 1.0.x. The
>> > ContainerLaunchContext is always set by Flink. I'm wondering why
>> > this
>> > error can still occur. Are you using the default Hadoop version
>> > that
>> > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of
>> > Flink.
>> >
>> > Does your Dev cluster share the Zookeeper installation with the
>> > production cluster? I'm wondering because it receives incorrect
>> > leadership information although the leading JobManager seems to be
>> > attempting to register at the ApplicationMaster.
>> >
>> > Best,
>> > Max
>> >
>> > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
>> > <[hidden email]> wrote:
>> > >
>> > >
>> > > Log from yarn session runner is here:
>> > > http://pastebin.com/xW1W4HNP
>> > >
>> > > Our hadoop distribution is from cloudera, resourcenanager
>> > > version:
>> > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some
>> > > redirecting
>> > > on
>> > > accessing resourcemanager and/or namenode to active one).
>> > >
>> > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
>> > > >
>> > > >
>> > > > This could be a bug in Flink. Can you share the complete logs
>> > > > of
>> > > > the
>> > > > run? CC'ing Max who worked on the YARN client recently who
>> > > > might
>> > > > have
>> > > > an idea in which cases Flink would not set the context.
>> > > >
>> > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
>> > > > <[hidden email]> wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > Hi guys,
>> > > > >
>> > > > > i've run into some problems with flink/yarn. I try to deploy
>> > > > > flink
>> > > > > to
>> > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-
>> > > > > session.sh,
>> > > > > but
>> > > > > the
>> > > > > yarn application does not even start, it goes from accepted
>> > > > > to
>> > > > > finished/failed. Yarn info on resourcemanager looks like
>> > > > > this:
>> > > > >
>> > > > > User: wa-flink
>> > > > > Name: Flink session with 3 TaskManagers
>> > > > > Ap
>> > > > > plication Type: Apache Flink
>> > > > > Application Tags:
>> > > > > State: FINISHED
>> > > > > FinalStatus: FAILED
>> > > > > Started: Mon Aug 15 18:02:42 +0200 2016
>> > > > > Elapsed: 16sec
>> > > > > Tracking URL: History
>> > > > > Diagnostics: Fatal error in AM: The ContainerLaunchContext
>> > > > > was
>> > > > > not set.
>> > > > >
>> > > > > On dev cluster, applications deploys without problem, this
>> > > > > happens
>> > > > > only
>> > > > > in production.
>> > > > >
>> > > > > What could be wrong?
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > --
>> > > > > Miroslav Gajdoš
>> > > > > vývoj - webová analytika (Brno)
>> > > > > https://reporter.seznam.cz
>> > > > > [hidden email]
>> > > > >
>> > > > >
>> > > --
>> > > Miroslav Gajdoš
>> > > vývoj - webová analytika (Brno)
>> > > https://reporter.seznam.cz
>> > > [hidden email]
> --
> Miroslav Gajdoš
> vývoj - webová analytika (Brno)
> https://reporter.seznam.cz
> [hidden email]

Miroslav Gajdoš

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Here is the log from yarn application - run on another cluster (this
time cdh5.7.0, but with similar configuration). Check the hostnames; in
configuration, there are aliases used and the difference from fqdn may
be the cause, judging by the log (exception at line 87)...

http://pastebin.com/iimPVbXB

Thanks,
Mira

Maximilian Michels píše v Pá 19. 08. 2016 v 09:12 +0200:

> Hi Mira,
>
> If I understood correctly, the log output should be for Flink 1.1.1.
> However, there are classes present in the log which don't exist in
> Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you
> posted the correct log?
>
> Also, it would be good to have not only the client log but also the
> log of the Flink Yarn application.
>
> Thanks,
> Max
>
> On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš
> <[hidden email]> wrote:
> >
> > Tried to build it from source as well as use prebuilt binary
> > release
> > (v1.1.1), the last one produced this log output:
> > http://pastebin.com/3L5Yhs9x
> >
> > Application in yarn still fails on "Fatal error in AM: The
> > ContainerLaunchContext was not set".
> >
> > Mira
> >
> > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200:
> > >
> > > Hi Max,
> > >
> > > we are building it from sources and package it for debian. I can
> > > try
> > > to
> > > use the binary release for hadoop 2.6.0.
> > >
> > > Regarding zookeeper, we do not share instances between dev and
> > > production.
> > >
> > > Thanks,
> > > Miroslav
> > >
> > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200:
> > > >
> > > >
> > > > Hi Miroslav,
> > > >
> > > > From the logs it looks like you're using Flink version 1.0.x.
> > > > The
> > > > ContainerLaunchContext is always set by Flink. I'm wondering
> > > > why
> > > > this
> > > > error can still occur. Are you using the default Hadoop version
> > > > that
> > > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build
> > > > of
> > > > Flink.
> > > >
> > > > Does your Dev cluster share the Zookeeper installation with the
> > > > production cluster? I'm wondering because it receives incorrect
> > > > leadership information although the leading JobManager seems to
> > > > be
> > > > attempting to register at the ApplicationMaster.
> > > >
> > > > Best,
> > > > Max
> > > >
> > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
> > > > <[hidden email]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Log from yarn session runner is here:
> > > > > http://pastebin.com/xW1W4HNP
> > > > >
> > > > > Our hadoop distribution is from cloudera, resourcenanager
> > > > > version:
> > > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some
> > > > > redirecting
> > > > > on
> > > > > accessing resourcemanager and/or namenode to active one).
> > > > >
> > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
> > > > > >
> > > > > >
> > > > > >
> > > > > > This could be a bug in Flink. Can you share the complete
> > > > > > logs
> > > > > > of
> > > > > > the
> > > > > > run? CC'ing Max who worked on the YARN client recently who
> > > > > > might
> > > > > > have
> > > > > > an idea in which cases Flink would not set the context.
> > > > > >
> > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
> > > > > > <[hidden email]> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi guys,
> > > > > > >
> > > > > > > i've run into some problems with flink/yarn. I try to
> > > > > > > deploy
> > > > > > > flink
> > > > > > > to
> > > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-
> > > > > > > session.sh,
> > > > > > > but
> > > > > > > the
> > > > > > > yarn application does not even start, it goes from
> > > > > > > accepted
> > > > > > > to
> > > > > > > finished/failed. Yarn info on resourcemanager looks like
> > > > > > > this:
> > > > > > >
> > > > > > > User:   wa-flink
> > > > > > > Name:   Flink session with 3 TaskManagers
> > > > > > > Ap
> > > > > > > plication Type:         Apache Flink
> > > > > > > Application Tags:
> > > > > > > State:  FINISHED
> > > > > > > FinalStatus:    FAILED
> > > > > > > Started:        Mon Aug 15 18:02:42 +0200 2016
> > > > > > > Elapsed:        16sec
> > > > > > > Tracking URL:   History
> > > > > > > Diagnostics:    Fatal error in AM: The
> > > > > > > ContainerLaunchContext
> > > > > > > was
> > > > > > > not set.
> > > > > > >
> > > > > > > On dev cluster, applications deploys without problem,
> > > > > > > this
> > > > > > > happens
> > > > > > > only
> > > > > > > in production.
> > > > > > >
> > > > > > > What could be wrong?
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > --
> > > > > > > Miroslav Gajdoš
> > > > > > > vývoj - webová analytika (Brno)
> > > > > > > https://reporter.seznam.cz
> > > > > > > [hidden email]
> > > > > > >
> > > > > > >
> > > > > --
> > > > > Miroslav Gajdoš
> > > > > vývoj - webová analytika (Brno)
> > > > > https://reporter.seznam.cz
> > > > > [hidden email]
> > --
> > Miroslav Gajdoš
> > vývoj - webová analytika (Brno)
> > https://reporter.seznam.cz
> > [hidden email]

--
Miroslav Gajdoš
vývoj - webová analytika (Brno)
https://reporter.seznam.cz
[hidden email]

Maximilian Michels

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

Hi Mira,

Does using the fully-qualified hostname solve the issue?

Thanks,
Max

On Mon, Aug 22, 2016 at 1:38 PM, Miroslav Gajdoš
<[hidden email]> wrote:

> Here is the log from yarn application - run on another cluster (this
> time cdh5.7.0, but with similar configuration). Check the hostnames; in
> configuration, there are aliases used and the difference from fqdn may
> be the cause, judging by the log (exception at line 87)...
>
> http://pastebin.com/iimPVbXB
>
> Thanks,
> Mira
>
>
>
> Maximilian Michels píše v Pá 19. 08. 2016 v 09:12 +0200:
>> Hi Mira,
>>
>> If I understood correctly, the log output should be for Flink 1.1.1.
>> However, there are classes present in the log which don't exist in
>> Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you
>> posted the correct log?
>>
>> Also, it would be good to have not only the client log but also the
>> log of the Flink Yarn application.
>>
>> Thanks,
>> Max
>>
>> On Thu, Aug 18, 2016 at 3:20 PM, Miroslav Gajdoš
>> <[hidden email]> wrote:
>> >
>> > Tried to build it from source as well as use prebuilt binary
>> > release
>> > (v1.1.1), the last one produced this log output:
>> > http://pastebin.com/3L5Yhs9x
>> >
>> > Application in yarn still fails on "Fatal error in AM: The
>> > ContainerLaunchContext was not set".
>> >
>> > Mira
>> >
>> > Miroslav Gajdoš píše v Čt 18. 08. 2016 v 10:36 +0200:
>> > >
>> > > Hi Max,
>> > >
>> > > we are building it from sources and package it for debian. I can
>> > > try
>> > > to
>> > > use the binary release for hadoop 2.6.0.
>> > >
>> > > Regarding zookeeper, we do not share instances between dev and
>> > > production.
>> > >
>> > > Thanks,
>> > > Miroslav
>> > >
>> > > Maximilian Michels píše v Čt 18. 08. 2016 v 10:17 +0200:
>> > > >
>> > > >
>> > > > Hi Miroslav,
>> > > >
>> > > > From the logs it looks like you're using Flink version 1.0.x.
>> > > > The
>> > > > ContainerLaunchContext is always set by Flink. I'm wondering
>> > > > why
>> > > > this
>> > > > error can still occur. Are you using the default Hadoop version
>> > > > that
>> > > > comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build
>> > > > of
>> > > > Flink.
>> > > >
>> > > > Does your Dev cluster share the Zookeeper installation with the
>> > > > production cluster? I'm wondering because it receives incorrect
>> > > > leadership information although the leading JobManager seems to
>> > > > be
>> > > > attempting to register at the ApplicationMaster.
>> > > >
>> > > > Best,
>> > > > Max
>> > > >
>> > > > On Tue, Aug 16, 2016 at 1:28 PM, Miroslav Gajdoš
>> > > > <[hidden email]> wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > Log from yarn session runner is here:
>> > > > > http://pastebin.com/xW1W4HNP
>> > > > >
>> > > > > Our hadoop distribution is from cloudera, resourcenanager
>> > > > > version:
>> > > > > 2.6.0-cdh5.4.5, it runs in HA mode (there could be some
>> > > > > redirecting
>> > > > > on
>> > > > > accessing resourcemanager and/or namenode to active one).
>> > > > >
>> > > > > Ufuk Celebi píše v Út 16. 08. 2016 v 12:18 +0200:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > This could be a bug in Flink. Can you share the complete
>> > > > > > logs
>> > > > > > of
>> > > > > > the
>> > > > > > run? CC'ing Max who worked on the YARN client recently who
>> > > > > > might
>> > > > > > have
>> > > > > > an idea in which cases Flink would not set the context.
>> > > > > >
>> > > > > > On Tue, Aug 16, 2016 at 11:00 AM, Miroslav Gajdoš
>> > > > > > <[hidden email]> wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Hi guys,
>> > > > > > >
>> > > > > > > i've run into some problems with flink/yarn. I try to
>> > > > > > > deploy
>> > > > > > > flink
>> > > > > > > to
>> > > > > > > our cluster using /usr/lib/flink-scala2.10/bin/yarn-
>> > > > > > > session.sh,
>> > > > > > > but
>> > > > > > > the
>> > > > > > > yarn application does not even start, it goes from
>> > > > > > > accepted
>> > > > > > > to
>> > > > > > > finished/failed. Yarn info on resourcemanager looks like
>> > > > > > > this:
>> > > > > > >
>> > > > > > > User: wa-flink
>> > > > > > > Name: Flink session with 3 TaskManagers
>> > > > > > > Ap
>> > > > > > > plication Type: Apache Flink
>> > > > > > > Application Tags:
>> > > > > > > State: FINISHED
>> > > > > > > FinalStatus: FAILED
>> > > > > > > Started: Mon Aug 15 18:02:42 +0200 2016
>> > > > > > > Elapsed: 16sec
>> > > > > > > Tracking URL: History
>> > > > > > > Diagnostics: Fatal error in AM: The
>> > > > > > > ContainerLaunchContext
>> > > > > > > was
>> > > > > > > not set.
>> > > > > > >
>> > > > > > > On dev cluster, applications deploys without problem,
>> > > > > > > this
>> > > > > > > happens
>> > > > > > > only
>> > > > > > > in production.
>> > > > > > >
>> > > > > > > What could be wrong?
>> > > > > > >
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > --
>> > > > > > > Miroslav Gajdoš
>> > > > > > > vývoj - webová analytika (Brno)
>> > > > > > > https://reporter.seznam.cz
>> > > > > > > [hidden email]
>> > > > > > >
>> > > > > > >
>> > > > > --
>> > > > > Miroslav Gajdoš
>> > > > > vývoj - webová analytika (Brno)
>> > > > > https://reporter.seznam.cz
>> > > > > [hidden email]
>> > --
>> > Miroslav Gajdoš
>> > vývoj - webová analytika (Brno)
>> > https://reporter.seznam.cz
>> > [hidden email]
> --
> Miroslav Gajdoš
> vývoj - webová analytika (Brno)
> https://reporter.seznam.cz
> [hidden email]