Hi!It seems that the accumulator behaves in a non-standard way, but the JobManager should also catch that (log a warning or debug message) and simply continue (not crash).I'll try to add a patch that the JobManager tolerates these kinds of issues in the accumulators.StephanOn Thu, Jan 19, 2017 at 7:26 PM, Dave Marion <[hidden email]> wrote:Noticed I didn't cc the user list.
---------- Original Message ----------
From: Dave Marion <[hidden email]>
To: Ted Yu <[hidden email]>
Date: January 19, 2017 at 12:13 PM
Subject: Re: NPE in JobManagerThat might take some time. Here is a hand typed top N lines. If that is not enough let me know and I will start the process of getting the full stack trace.
NullPointerException
at JobManager$$updateAccumulators
$1.apply(JobManager.scala: 1790) at JobManager$$updateAccumulators
$1.apply(JobManager.scala: 1788) at scala.collection.mutable.Resiz
ableArray$class.forEach(ArrayB uffer.scala:48) at scala.collection.mutable.Array
Buffer.forEach(ArrayBuffer. scala:48) at org.apache.flink.runtime.jobma
nager.JobManager.org $apache$flink$runtime$jobmanager$ JobManager$$updateAccumulators (JobManager.scala:1788) at org.apache.flink.runtime.jobma
nager.JobManager$$anonfun$ handleMessage$1.applyOrElse( JobManager.scala:967) at scala.runtime.AbstractPartialF
unction.apply(AbstractPartialF unction.scala:36) at org.apache.flink.runtime.Leade
rSessionMassageFilter$$anonfun $receive$1.applyOrEslse(Leader SessionMessageFilter.scala:44) at scala.runtime.AbstractPartialF
unction.apply(AbstractPartialF unction.scala:36) at org.apache.flink.runtime.LogMe
ssages$$anon$1.apply(LogMessag es.scala:33) at org.apache.flink.runtime.LogMe
ssages$$anon$1.apply(LogMessag es.scala:28) at scala.PartialFunction$class.ap
plyOrElse(PartialFunction.scal a:123) at org.apache.flink.runtime.LogMe
sages$$anon$1.applyOrElse( LogMessages.scala:28)
On January 19, 2017 at 11:58 AM Ted Yu <[hidden email]> wrote:Can you pastebin the complete stack trace for the NPE ?ThanksOn Thu, Jan 19, 2017 at 8:57 AM, Dave Marion <[hidden email]> wrote:I'm running flink-1.1.4-bin-hadoop27-scala
_2.11 and I'm running into an issue where after some period of time (measured in 1 - 3 hours) the JobManager gets an NPE and shuts itself down. The failure is at JobManager$$updateAccumulators $1.apply(JobManager.scala:1790 ). I'm using a custom accumulator[1], but can't tell from the JobManager code whether the issue is in my Accumulator, or is a bug in the JobManager.
Free forum by Nabble | Edit this page |