Flink AskTimeoutException killing the jobs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink AskTimeoutException killing the jobs

M Singh
Hi:

I am using Flink 1.10 on AWS EMR cluster.

We are getting AskTimeoutExceptions which is causing the flink jobs to die.   

Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1602864959]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
    ... 9 more


Can you please let me know where can I set the timeout for this timeout ? 

I could not find this specific timeout in the flink doc - Apache Flink 1.10 Documentation: Configuration.


Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Flink AskTimeoutException killing the jobs

Xintong Song
The configuration option you're looking for is `akka.ask.timeout`.


However, I'm not sure increasing this configuration would help in your case. The error message shows that there is a timeout on a local message. It is wired a local message does not get replied within 10 sec. I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 3:51 AM M Singh <[hidden email]> wrote:
Hi:

I am using Flink 1.10 on AWS EMR cluster.

We are getting AskTimeoutExceptions which is causing the flink jobs to die.   

Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1602864959]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
    ... 9 more


Can you please let me know where can I set the timeout for this timeout ? 

I could not find this specific timeout in the flink doc - Apache Flink 1.10 Documentation: Configuration.


Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Flink AskTimeoutException killing the jobs

M Singh
Hi Xintong/LakeShen:

We have the following setting in flink-conf.yaml

akka.ask.timeout: 180 s

akka.tcp.timeout: 180 s



But still see this exception.  Are there multiple akka.ask.timeout or additional settings required ?


Thanks

Mans

On Friday, July 3, 2020, 01:08:05 AM EDT, Xintong Song <[hidden email]> wrote:


The configuration option you're looking for is `akka.ask.timeout`.


However, I'm not sure increasing this configuration would help in your case. The error message shows that there is a timeout on a local message. It is wired a local message does not get replied within 10 sec. I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 3:51 AM M Singh <[hidden email]> wrote:
Hi:

I am using Flink 1.10 on AWS EMR cluster.

We are getting AskTimeoutExceptions which is causing the flink jobs to die.   

Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1602864959]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
    ... 9 more


Can you please let me know where can I set the timeout for this timeout ? 

I could not find this specific timeout in the flink doc - Apache Flink 1.10 Documentation: Configuration.


Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Flink AskTimeoutException killing the jobs

Xintong Song
As I already mentioned,

I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


The Akka ask timeout does not seem to be the root problem to me.

Thank you~

Xintong Song



On Sat, Jul 4, 2020 at 12:12 AM M Singh <[hidden email]> wrote:
Hi Xintong/LakeShen:

We have the following setting in flink-conf.yaml

akka.ask.timeout: 180 s

akka.tcp.timeout: 180 s



But still see this exception.  Are there multiple akka.ask.timeout or additional settings required ?


Thanks

Mans

On Friday, July 3, 2020, 01:08:05 AM EDT, Xintong Song <[hidden email]> wrote:


The configuration option you're looking for is `akka.ask.timeout`.


However, I'm not sure increasing this configuration would help in your case. The error message shows that there is a timeout on a local message. It is wired a local message does not get replied within 10 sec. I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 3:51 AM M Singh <[hidden email]> wrote:
Hi:

I am using Flink 1.10 on AWS EMR cluster.

We are getting AskTimeoutExceptions which is causing the flink jobs to die.   

Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1602864959]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
    ... 9 more


Can you please let me know where can I set the timeout for this timeout ? 

I could not find this specific timeout in the flink doc - Apache Flink 1.10 Documentation: Configuration.


Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Flink AskTimeoutException killing the jobs

M Singh
Thanks Xintong.  I will check the logs.  

On Sunday, July 5, 2020, 09:29:31 PM EDT, Xintong Song <[hidden email]> wrote:


As I already mentioned,

I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


The Akka ask timeout does not seem to be the root problem to me.

Thank you~

Xintong Song



On Sat, Jul 4, 2020 at 12:12 AM M Singh <[hidden email]> wrote:
Hi Xintong/LakeShen:

We have the following setting in flink-conf.yaml

akka.ask.timeout: 180 s

akka.tcp.timeout: 180 s



But still see this exception.  Are there multiple akka.ask.timeout or additional settings required ?


Thanks

Mans

On Friday, July 3, 2020, 01:08:05 AM EDT, Xintong Song <[hidden email]> wrote:


The configuration option you're looking for is `akka.ask.timeout`.


However, I'm not sure increasing this configuration would help in your case. The error message shows that there is a timeout on a local message. It is wired a local message does not get replied within 10 sec. I would suggest to look into the jobmanager logs and gc logs, see if there's any problem that prevent the process from handling the rpc messages timely.


Thank you~

Xintong Song



On Fri, Jul 3, 2020 at 3:51 AM M Singh <[hidden email]> wrote:
Hi:

I am using Flink 1.10 on AWS EMR cluster.

We are getting AskTimeoutExceptions which is causing the flink jobs to die.   

Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1602864959]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
    at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
    ... 9 more


Can you please let me know where can I set the timeout for this timeout ? 

I could not find this specific timeout in the flink doc - Apache Flink 1.10 Documentation: Configuration.


Thanks

Mans