job manager timeout

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

job manager timeout

Radu Tudoran

Hi,

 

I am running a program that works fine locally, but when I try to run it on the cluster I get a timeout error from the client that tries to connect to the jobmanager. There is no issue with contacting the jobmanager form the machine, as it works just fine for other stream applications. I suspect that because the stream topology is rather complex, there is an issue with deploying the schematic. I am not sure if this is a normal behavior (IMHO I would think it should not fail just because the topology is more complex). Hence, if the error helps to identify the underlyin issue (if any) please see it below.

Meanwhile, can you please educate me on how I can configure the timeout such that it won’t fail anymore.

 

Thanks

 

 

 

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Job submission to the JobManager timed out.

        at org.apache.flink.client.program.Client.runBlocking(Client.java:370)

        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:96)

        at application.MainStreamApp.main(MainStreamApp.java:108)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)

        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)

        at org.apache.flink.client.program.Client.runBlocking(Client.java:252)

        at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)

        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)

        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)

        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)

Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager failed: Job submission to the JobManager timed out.

        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)

        at org.apache.flink.client.program.Client.runBlocking(Client.java:368)

        ... 13 more

Caused by: org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job submission to the JobManager timed out.

        at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:255)

        at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)

        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)

        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)

        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)

        at akka.actor.ActorCell.invoke(ActorCell.scala:487)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)

        at akka.dispatch.Mailbox.run(Mailbox.scala:221)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

 

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: +49 15209084330

Telephone: +49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

 

Reply | Threaded
Open this post in threaded view
|

Re: job manager timeout

rmetzger0
Hi Radu,

did you check the JobManager logs as well? Maybe there you can see why the JobManager is failing.

The timeout is configurable through the "akka.client.timeout" variable. The default value is "60 s".

On Wed, Feb 10, 2016 at 7:35 PM, Radu Tudoran <[hidden email]> wrote:

Hi,

 

I am running a program that works fine locally, but when I try to run it on the cluster I get a timeout error from the client that tries to connect to the jobmanager. There is no issue with contacting the jobmanager form the machine, as it works just fine for other stream applications. I suspect that because the stream topology is rather complex, there is an issue with deploying the schematic. I am not sure if this is a normal behavior (IMHO I would think it should not fail just because the topology is more complex). Hence, if the error helps to identify the underlyin issue (if any) please see it below.

Meanwhile, can you please educate me on how I can configure the timeout such that it won’t fail anymore.

 

Thanks

 

 

 

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Job submission to the JobManager timed out.

        at org.apache.flink.client.program.Client.runBlocking(Client.java:370)

        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:96)

        at application.MainStreamApp.main(MainStreamApp.java:108)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)

        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)

        at org.apache.flink.client.program.Client.runBlocking(Client.java:252)

        at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)

        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)

        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)

        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)

Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager failed: Job submission to the JobManager timed out.

        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)

        at org.apache.flink.client.program.Client.runBlocking(Client.java:368)

        ... 13 more

Caused by: org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job submission to the JobManager timed out.

        at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:255)

        at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)

        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)

        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)

        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)

        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)

        at akka.actor.ActorCell.invoke(ActorCell.scala:487)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)

        at akka.dispatch.Mailbox.run(Mailbox.scala:221)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

 

 

Dr. Radu Tudoran

Research Engineer - Big Data Expert

IT R&D Division

 

cid:image007.jpg@01CD52EB.AD060EE0

HUAWEI TECHNOLOGIES Duesseldorf GmbH

European Research Center

Riesstrasse 25, 80992 München

 

E-mail: [hidden email]

Mobile: <a href="tel:%2B49%2015209084330" value="+4915209084330" target="_blank">+49 15209084330

Telephone: <a href="tel:%2B49%20891588344173" value="+49891588344173" target="_blank">+49 891588344173

 

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany,
www.huawei.com
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN

This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!