Hi, I was experimenting with HA lately and see that it recovers successfully job, in the case of jobmanager restarts. Now my question is whether it will work for the job cluster. Based on the instructions https://github.com/apache/flink/blob/release-1.8/flink-container/docker/README.md I can see https://github.com/apache/flink/blob/release-1.8/flink-container/docker/docker-entrypoint.sh that In this case the following command is invoked: exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@“ Which means that if a jobManager restarts, the following is going to happen: 1. It will use HA to restore job that was running 2. A new job will be submitted, overwriting restored job and bypassing checkpoint restore. Am I missing something here? |
Hello Boris, I think you are confused by the name of the shell script "standalone-job.sh" Which basically means that we start a "standalone job manager" as stated in the first comment of https://github.com/apache/flink/blob/release-1.8/flink-dist/src/main/flink-bin/bin/standalone-job.sh This is another version of : flink-dist/src/main/flink-bin/bin/jobmanager.sh It's not related to a job When you configure H-A on a flink cluster, and you submit a job, Flink (i.e the jobmanager) store the state of the job in Zookeeper / HDFS So when it crashes and comes back (with this entrypoint) it will read in ZK / HDFS and restore previous execution Regards, Bastien ------------------ Le ven. 24 mai 2019 à 23:22, Boris Lublinsky <[hidden email]> a écrit :
|
Respected All, I am a new learner of Apache Flink. I want to run existing Graph algorithms (examples) given in Flink download software with my own data. But I am not getting how to run those existing example algos on my input data. Kindly suggest me a solution. From: "bastien dine" <[hidden email]> To: "Boris Lublinsky" <[hidden email]> Cc: "user" <[hidden email]> Sent: Saturday, May 25, 2019 1:15:32 PM Subject: Re: Job cluster and HA Hello Boris, I think you are confused by the name of the shell script "standalone-job.sh" Which basically means that we start a "standalone job manager" as stated in the first comment of https://github.com/apache/flink/blob/release-1.8/flink-dist/src/main/flink-bin/bin/standalone-job.sh This is another version of : flink-dist/src/main/flink-bin/bin/jobmanager.sh It's not related to a job When you configure H-A on a flink cluster, and you submit a job, Flink (i.e the jobmanager) store the state of the job in Zookeeper / HDFS So when it crashes and comes back (with this entrypoint) it will read in ZK / HDFS and restore previous execution Regards, Bastien ------------------ Le ven. 24 mai 2019 à 23:22, Boris Lublinsky <[hidden email]> a écrit :
|
In reply to this post by bastien dine
I understand this.
What I am not sure about is the sequence: It will come back and restore from ZK/HDFS And then it will try to start a job specified in the class, right? Which will overwrite (potentially) the restored job. How does it know not to start the job defined in the class, once the previous one was restored?
|
In reply to this post by RAMALINGESWARA RAO THOTTEMPUDI
Hey, this page explains how to run a Flink job: https://ci.apache.org/projects/flink/flink-docs-master/getting-started/tutorials/local_setup.html On Sat, May 25, 2019 at 1:28 PM RAMALINGESWARA RAO THOTTEMPUDI <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |