Flink Deployment failing with RestClientException

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Deployment failing with RestClientException

Samir Chauhan

Hi,

I am having issue where after deploying few jobs, it starts failing with below errors. I don’t have such issue in other environments. What should I check first in such scenario?

My environment is

Azure Kubernetes 1.15.7

Flink 1.6.0

Zookeeper 3.4.10

 

The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
        at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
        at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
        ... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        ... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        ... 5 more

 

 

More errors

at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:06,675 ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:07,676 ERROR org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)

 

 

 

Warm Regards,

Samir Chauhan

 

Regional Infrastructure & Operations

 

cid:image002.png@01D12B8E.C23F3E10

 

Prudential Services Singapore Pte Ltd

1 Wallich Street #19-01, Guoco Tower Singapore 078881

 

Direct (65) 6704 7264 Mobile (65) 9721 7548

[hidden email]

 

www.prudential.com.sg

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Deployment failing with RestClientException

Andrey Zagrebin-4
Hi Samir,

It may be a known issue [1][2] where some action during job submission takes too long time but eventually completes in job manager.
Have you checked job manager logs whether there are any other failures, not “Ask timed out"?
Have you checked Web UI whether all the jobs have been started in fact despite the client error?

Best,
Andrey

[2] https://issues.apache.org/jira/browse/FLINK-16018

On 5 Mar 2020, at 17:49, Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

I am having issue where after deploying few jobs, it starts failing with below errors. I don’t have such issue in other environments. What should I check first in such scenario?

My environment is
Azure Kubernetes 1.15.7
Flink 1.6.0
Zookeeper 3.4.10

 

The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
        at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
        at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
        ... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        ... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        ... 5 more
 

 

More errors

at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[<a href="akka://flink/user/dispatcher#-1725880087" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:06,675 ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[<a href="akka://flink/user/dispatcher#-1725880087" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[<a href="akka://flink/user/dispatcher#-1725880087" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:07,676 ERROR org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[<a href="akka://flink/user/dispatcher#-1725880087" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[<a href="akka://flink/user/dispatcher#-1725880087" style="color: rgb(149, 79, 114); text-decoration: underline;" class="">akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
 

 

 
Warm Regards,
Samir Chauhan
 
Regional Infrastructure & Operations
 
<image001.png>
 
Prudential Services Singapore Pte Ltd
1 Wallich Street #19-01, Guoco Tower Singapore 078881
 
Direct (65) 6704 7264 Mobile (65) 9721 7548
 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.  

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.

Reply | Threaded
Open this post in threaded view
|

Re: Flink Deployment failing with RestClientException

rmetzger0
Hey Samir,

can you try setting the following configuration parameter (make sure the JobManager log confirms that the changed value is in effect)
web.timeout: 300000

This might uncover the underlying problem (as we are waiting longer for the underlying issue to timeout).

Are you able to upgrade to the latest Flink version easily?


On Thu, Mar 5, 2020 at 7:02 PM Andrey Zagrebin <[hidden email]> wrote:
Hi Samir,

It may be a known issue [1][2] where some action during job submission takes too long time but eventually completes in job manager.
Have you checked job manager logs whether there are any other failures, not “Ask timed out"?
Have you checked Web UI whether all the jobs have been started in fact despite the client error?

Best,
Andrey

[2] https://issues.apache.org/jira/browse/FLINK-16018

On 5 Mar 2020, at 17:49, Samir Tusharbhai Chauhan <[hidden email]> wrote:

Hi,

I am having issue where after deploying few jobs, it starts failing with below errors. I don’t have such issue in other environments. What should I check first in such scenario?

My environment is
Azure Kubernetes 1.15.7
Flink 1.6.0
Zookeeper 3.4.10

 

The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
        at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
        at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
        ... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        ... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        ... 5 more
 

 

More errors

at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:06,675 ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:07,676 ERROR org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
 

 

 
Warm Regards,
Samir Chauhan
 
Regional Infrastructure & Operations
 
<image001.png>
 
Prudential Services Singapore Pte Ltd
1 Wallich Street #19-01, Guoco Tower Singapore 078881
 
Direct (65) 6704 7264 Mobile (65) 9721 7548
 

 


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.  

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.