Implementation error: Unhandled exception - "Implementation error: Unhandled exception."

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Implementation error: Unhandled exception - "Implementation error: Unhandled exception."

Richard Deurwaarder
Hello,

We have a flink job / cluster running in kubernetes. Flink 1.6.2 (but the same happens in 1.6.0 and 1.6.1) To upgrade our job we use the REST API.

Every so often the jobmanager seems to be stuck in a crashing state and the logs show me this stack trace:

2018-11-07 18:43:05,815 [flink-scheduler-1] ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler - Implementation error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#1016927511]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.Implementation error: Unhandled exception.".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

If I restart the jobmanager everything is fine afterwards, but the jobmanager will not restart by itself.

What might've caused this and is this something we can prevent?

Richard
Reply | Threaded
Open this post in threaded view
|

Re: Implementation error: Unhandled exception - "Implementation error: Unhandled exception."

Timo Walther
Hi Richard,

this sounds like a bug to me. I will loop in Till (in CC) who might know more about this.

Regards,
Timo


Am 07.11.18 um 20:35 schrieb Richard Deurwaarder:
Hello,

We have a flink job / cluster running in kubernetes. Flink 1.6.2 (but the same happens in 1.6.0 and 1.6.1) To upgrade our job we use the REST API.

Every so often the jobmanager seems to be stuck in a crashing state and the logs show me this stack trace:

2018-11-07 18:43:05,815 [flink-scheduler-1] ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler - Implementation error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#1016927511]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.Implementation error: Unhandled exception.".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

If I restart the jobmanager everything is fine afterwards, but the jobmanager will not restart by itself.

What might've caused this and is this something we can prevent?

Richard


Reply | Threaded
Open this post in threaded view
|

Re: Implementation error: Unhandled exception - "Implementation error: Unhandled exception."

Till Rohrmann
Hi Richard,

could you share with us the complete logs to better debug the problem. What do you mean exactly with upgrading your job? Cancel with savepoint and then resuming the new job from the savepoint? Thanks a lot.

Cheers,
Till

On Mon, Nov 12, 2018 at 5:08 PM Timo Walther <[hidden email]> wrote:
Hi Richard,

this sounds like a bug to me. I will loop in Till (in CC) who might know more about this.

Regards,
Timo


Am 07.11.18 um 20:35 schrieb Richard Deurwaarder:
Hello,

We have a flink job / cluster running in kubernetes. Flink 1.6.2 (but the same happens in 1.6.0 and 1.6.1) To upgrade our job we use the REST API.

Every so often the jobmanager seems to be stuck in a crashing state and the logs show me this stack trace:

2018-11-07 18:43:05,815 [flink-scheduler-1] ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler - Implementation error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#1016927511]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.Implementation error: Unhandled exception.".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

If I restart the jobmanager everything is fine afterwards, but the jobmanager will not restart by itself.

What might've caused this and is this something we can prevent?

Richard