On January 20, 2017 at 4:29 AM Stephan Ewen <[hidden email]> wrote:Hi!It seems that the accumulator behaves in a non-standard way, but the JobManager should also catch that (log a warning or debug message) and simply continue (not crash).I'll try to add a patch that the JobManager tolerates these kinds of issues in the accumulators.StephanOn Thu, Jan 19, 2017 at 7:26 PM, Dave Marion <[hidden email]> wrote:Noticed I didn't cc the user list.
---------- Original Message ----------
From: Dave Marion <[hidden email]>
To: Ted Yu <[hidden email]>
Date: January 19, 2017 at 12:13 PM
Subject: Re: NPE in JobManagerThat might take some time. Here is a hand typed top N lines. If that is not enough let me know and I will start the process of getting the full stack trace.
NullPointerException
at JobManager$$
updateAccumulators$1.apply( JobManager.scala:1790) at JobManager$$
updateAccumulators$1.apply( JobManager.scala:1788) at scala.collection.mutable.
ResizableArray$class.forEach( ArrayBuffer.scala:48) at scala.collection.mutable.
ArrayBuffer.forEach( ArrayBuffer.scala:48) at org.apache.flink.runtime.
jobmanager.JobManager.org$ apache$flink$runtime$ jobmanager$JobManager$$ updateAccumulators(JobManager. scala:1788) at org.apache.flink.runtime.
jobmanager.JobManager$$ anonfun$handleMessage$1. applyOrElse(JobManager.scala: 967) at scala.runtime.
AbstractPartialFunction.apply( AbstractPartialFunction.scala: 36) at org.apache.flink.runtime.
LeaderSessionMassageFilter$$ anonfun$receive$1. applyOrEslse( LeaderSessionMessageFilter. scala:44) at scala.runtime.
AbstractPartialFunction.apply( AbstractPartialFunction.scala: 36) at org.apache.flink.runtime.
LogMessages$$anon$1.apply( LogMessages.scala:33) at org.apache.flink.runtime.
LogMessages$$anon$1.apply( LogMessages.scala:28) at scala.PartialFunction$class.
applyOrElse(PartialFunction. scala:123) at org.apache.flink.runtime.
LogMesages$$anon$1. applyOrElse(LogMessages.scala: 28)
On January 19, 2017 at 11:58 AM Ted Yu <[hidden email]> wrote:Can you pastebin the complete stack trace for the NPE ?ThanksOn Thu, Jan 19, 2017 at 8:57 AM, Dave Marion <[hidden email]> wrote:I'm running flink-1.1.4-bin-hadoop27-scala
_2.11 and I'm running into an issue where after some period of time (measured in 1 - 3 hours) the JobManager gets an NPE and shuts itself down. The failure is at JobManager$$updateAccumulators $1.apply(JobManager.scala: 1790). I'm using a custom accumulator[1], but can't tell from the JobManager code whether the issue is in my Accumulator, or is a bug in the JobManager.
Free forum by Nabble | Edit this page |