Hi all, When I execute my Flink job using IntelliJ IDEA stand-alone mode, the job is executed successfully, but when I try to attach it to a stand-alone Flink cluster, my job fails with a Flink exception that “the assigned slot
was removed”. Does anyone have any idea why I am facing this issue? Thank you in advance, Konstantinos P.S.: The full stack of the exception observed is the following: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) Caused by: java.lang.IllegalStateException: Failed to execute ApplicationRunner at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:807) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:794) at org.springframework.boot.SpringApplication.run(SpringApplication.java:324) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248) at com.iri.aa.etl.EtlApplication.main(EtlApplication.java:22) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.flink.client.program.PackagedProgram.callMainMethod(Packag edProgram.java:529) ... 9 more Caused by: com.iri.aa.etl.exception.IriExecuteException: org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 860194c1a0ad72339a66d31ee11fda3a) at com.iri.aa.etl.rgm.AbstractRgmStoreJob.abstractExecute(AbstractRgmStoreJob.java:84) at com.iri.aa.etl.rgm.ThresholdAcvCalcCurrentJob.executeDry(ThresholdAcvCalcCurrentJob.java:37) at com.iri.aa.etl.job.JobExecutor.lambda$executeDryRunners$7(JobExecutor.java:44) at java.util.ArrayList.forEach(Unknown Source) at com.iri.aa.etl.job.JobExecutor.executeDryRunners(JobExecutor.java:44) at com.iri.aa.etl.job.JobExecutor.run(JobExecutor.java:35) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:804) ... 19 more Caused by: org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 860194c1a0ad72339a66d31ee11fda3a) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:475) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) at com.iri.aa.etl.rgm.AbstractRgmStoreJob.abstractExecute(AbstractRgmStoreJob.java:82) ... 25 more Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) ... 29 more Caused by: org.apache.flink.util.FlinkException: The assigned slot 94e8c776e67d1c2e160a3b492f6e7d7c_0 was removed. at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:893) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:863) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1058) at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:385) at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:825) at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener$1.run(ResourceManager.java:1139) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) |
Hi Konstantinos, looks like your using Spring to build your Flink job. Do you maybe use Spring's dependency injection mechanism to inject objects into objects, which are serialization and shipped to the taskmanagers? I could imagine this being the problem. In general, when a slot is removed this usually indicated a problem on the taskmanager side. Did you have a look at these logs as well? Best, Konstantin On Mon, Apr 8, 2019 at 1:03 PM Papadopoulos, Konstantinos <[hidden email]> wrote:
-- Konstantin Knauf | Solutions Architect +49 160 91394525 Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Data Artisans GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen |
Free forum by Nabble | Edit this page |