**RegistrationTimeoutException** after TaskExecutor successfully registered at resource manager

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

**RegistrationTimeoutException** after TaskExecutor successfully registered at resource manager

Victor Wong

Hi,

I’m using Flink version 1.7.1, and I encountered this exception which was a little weird from my point of view;

TaskManager successfully registered at resource manager, however after 5 minutes (which is the default value of taskmanager.registration.timeout config) it threw out RegistrationTimeoutException;

 

Here is the related logs of TM:

2019-08-09 01:30:24,061 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@xxx/user/resourcemanager(00000000000000000000000000000000).

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Resolved ResourceManager address, beginning registration

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Registration at ResourceManager attempt 1 (timeout=100ms)

2019-08-09 01:30:24,379 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Successful registration at resource manager akka.tcp://flink@xxx/user/resourcemanager under registration id 4535dea14648f6de68f32fb1a375806e.

2019-08-09 01:30:24,404 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Receive slot request AllocationID{372d1e10019c93c6c41d52b449cea5f2} for job e7b86795178efe43d7cac107c6cb8c33 from resource manager with leader id 00000000000000000000000000000000.

2019-08-09 01:30:33,590 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Un-registering task and sending final execution state FINISHED to JobManager for task Source: xxxx ; // I don’t know if this is related, so I add it here in case;  This Flink Kafka source just finished because it consumed no Kafka partitions (Flink Kafka parallelism > Kafka topic partitions)

2019-08-09 01:35:24,753 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor            - Fatal error occurred in TaskExecutor akka.tcp://flink@xxx/user/taskmanager_0.

org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1037)

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1023)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)

        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)

        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)

        at akka.actor.ActorCell.invoke(ActorCell.scala:495)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)

        at akka.dispatch.Mailbox.run(Mailbox.scala:224)

       at akka.dispatch.Mailbox.exec(Mailbox.scala:234)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

Thanks,

Victor

Reply | Threaded
Open this post in threaded view
|

Re: **RegistrationTimeoutException** after TaskExecutor successfully registered at resource manager

Biao Liu
Hi Victor,

There used to be several relevant issues reported [1] [2] [3]. I guess you have encountered the same problem.
This issue has been fixed in 1.8 [4]. Could you try it on a later version (1.8+)?


Thanks,
Biao /'bɪ.aʊ/



On Fri, Aug 9, 2019 at 4:01 PM Victor Wong <[hidden email]> wrote:

Hi,

I’m using Flink version 1.7.1, and I encountered this exception which was a little weird from my point of view;

TaskManager successfully registered at resource manager, however after 5 minutes (which is the default value of taskmanager.registration.timeout config) it threw out RegistrationTimeoutException;

 

Here is the related logs of TM:

2019-08-09 01:30:24,061 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@xxx/user/resourcemanager(00000000000000000000000000000000).

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Resolved ResourceManager address, beginning registration

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Registration at ResourceManager attempt 1 (timeout=100ms)

2019-08-09 01:30:24,379 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Successful registration at resource manager akka.tcp://flink@xxx/user/resourcemanager under registration id 4535dea14648f6de68f32fb1a375806e.

2019-08-09 01:30:24,404 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Receive slot request AllocationID{372d1e10019c93c6c41d52b449cea5f2} for job e7b86795178efe43d7cac107c6cb8c33 from resource manager with leader id 00000000000000000000000000000000.

2019-08-09 01:30:33,590 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Un-registering task and sending final execution state FINISHED to JobManager for task Source: xxxx ; // I don’t know if this is related, so I add it here in case;  This Flink Kafka source just finished because it consumed no Kafka partitions (Flink Kafka parallelism > Kafka topic partitions)

2019-08-09 01:35:24,753 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor            - Fatal error occurred in TaskExecutor akka.tcp://flink@xxx/user/taskmanager_0.

org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1037)

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1023)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)

        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)

        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)

        at akka.actor.ActorCell.invoke(ActorCell.scala:495)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)

        at akka.dispatch.Mailbox.run(Mailbox.scala:224)

       at akka.dispatch.Mailbox.exec(Mailbox.scala:234)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

Thanks,

Victor

Reply | Threaded
Open this post in threaded view
|

Re: **RegistrationTimeoutException** after TaskExecutor successfully registered at resource manager

Victor Wong

Hi Biao,

 

Thanks for your reply, I will give it a try (1.8+)!

 

Best,

Victor

 

From: Biao Liu <[hidden email]>
Date: Friday, August 9, 2019 at 5:45 PM
To: Victor Wong <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: **RegistrationTimeoutException** after TaskExecutor successfully registered at resource manager

 

Hi Victor,

 

There used to be several relevant issues reported [1] [2] [3]. I guess you have encountered the same problem.

This issue has been fixed in 1.8 [4]. Could you try it on a later version (1.8+)?

 

 

Thanks,

Biao /'bɪ.aʊ/

 

 

 

On Fri, Aug 9, 2019 at 4:01 PM Victor Wong <[hidden email]> wrote:

Hi,

I’m using Flink version 1.7.1, and I encountered this exception which was a little weird from my point of view;

TaskManager successfully registered at resource manager, however after 5 minutes (which is the default value of taskmanager.registration.timeout config) it threw out RegistrationTimeoutException;

 

Here is the related logs of TM:

2019-08-09 01:30:24,061 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@xxx/user/resourcemanager(00000000000000000000000000000000).

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Resolved ResourceManager address, beginning registration

2019-08-09 01:30:24,296 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Registration at ResourceManager attempt 1 (timeout=100ms)

2019-08-09 01:30:24,379 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Successful registration at resource manager akka.tcp://flink@xxx/user/resourcemanager under registration id 4535dea14648f6de68f32fb1a375806e.

2019-08-09 01:30:24,404 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Receive slot request AllocationID{372d1e10019c93c6c41d52b449cea5f2} for job e7b86795178efe43d7cac107c6cb8c33 from resource manager with leader id 00000000000000000000000000000000.

2019-08-09 01:30:33,590 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Un-registering task and sending final execution state FINISHED to JobManager for task Source: xxxx ; // I don’t know if this is related, so I add it here in case;  This Flink Kafka source just finished because it consumed no Kafka partitions (Flink Kafka parallelism > Kafka topic partitions)

2019-08-09 01:35:24,753 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor            - Fatal error occurred in TaskExecutor akka.tcp://flink@xxx/user/taskmanager_0.

org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1037)

        at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1023)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)

        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)

        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)

        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)

        at akka.actor.ActorCell.invoke(ActorCell.scala:495)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)

        at akka.dispatch.Mailbox.run(Mailbox.scala:224)

       at akka.dispatch.Mailbox.exec(Mailbox.scala:234)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

Thanks,

Victor