killing process in Flink cluster

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

killing process in Flink cluster

Ramkumar
Hi All,

I am new to Flink. I am running wordcount streaming program in cluster. It has take more time. So I stopped that process manually. But it still in canceling, there are two subtasks in cluster one has successfully canceled but another one is still canceling. We tried to kill the process in command prompt using grep and kill command. But couldn't kill that process.Please help me to kill that process.

Best Regards,
Ramkumar L
Reply | Threaded
Open this post in threaded view
|

Re: killing process in Flink cluster

rmetzger0
Hi,

so you tried to stop flink by killing the processes?
I assume you've started Flink in the standalone cluster mode?
If you a kill, and a kill -9 should definitively stop Flink.
Did you check the log files of the task manager? The Flink services are logging when they are receiving signals from the operating system. If the cancelling doesn't work for some reason, it'll be logged as well.

In general, I would recommend cancelling the job from the CLI or web interface. In some cases, the job might not stop, for example when the user code is blocking all the time (a good example is a while(true){} loop that doesn't get stopped from a cancel() call).

What is your Flink job doing? Does it contain a custom source?




On Fri, May 13, 2016 at 11:50 AM, Ramkumar <[hidden email]> wrote:
Hi All,

I am new to Flink. I am running wordcount streaming program in cluster. It
has take more time. So I stopped that process manually. But it still in
canceling, there are two subtasks in cluster one has successfully canceled
but another one is still canceling. We tried to kill the process in command
prompt using grep and kill command. But couldn't kill that process.Please
help me to kill that process.

Best Regards,
Ramkumar L




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/killing-process-in-Flink-cluster-tp6889.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: killing process in Flink cluster

Ramkumar
Hi,

As you mentioned, my program contains a loop with infinite iterations.  I was not able to stop by kill -9 command.  Instead I killed the entire flink cluster job.  I set up the cluster again with one master and three slaves.  Now when I try to run the code, it is showing the following exceptions.

 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager.
        at org.apache.flink.client.program.Client.runBlocking(Client.java:381)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:355)
        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
        at org.apache.flink.project05.WordCount.main(WordCount.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:248)
        at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1189)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager failed: Lost connection to the JobManager.
        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:141)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:379)
        ... 14 more
Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: Lost connection to the JobManager.
        at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:244)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Reply | Threaded
Open this post in threaded view
|

Re: killing process in Flink cluster

Till Rohrmann

Could you check the logs in FLINK_HOME/log/. They might tell you what went wrong after submitting the job to the jobmanager.

Cheers,
Till


On Wed, May 18, 2016 at 10:05 AM, Ramkumar <[hidden email]> wrote:
Hi,

As you mentioned, my program contains a loop with infinite iterations.  I
was not able to stop by kill -9 command.  Instead I killed the entire flink
cluster job.  I set up the cluster again with one master and three slaves.
Now when I try to run the code, it is showing the following exceptions.

 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: Communication with JobManager failed: Lost connection to
the JobManager.
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:381)
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:355)
        at
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
        at org.apache.flink.project05.WordCount.main(WordCount.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
        at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:248)
        at
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
        at
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1189)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239)
Caused by: org.apache.flink.runtime.client.JobExecutionException:
Communication with JobManager failed: Lost connection to the JobManager.
        at
org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:141)
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:379)
        ... 14 more
Caused by:
org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
Lost connection to the JobManager.
        at
org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:244)
        at
org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
        at
org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
        at
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/killing-process-in-Flink-cluster-tp6889p6982.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.