(DEPRECATED) Apache Flink User Mailing List archive.

How to prevent from launching 2 jobs at the same time

Classic

List

Threaded

7 messages Options

aldu29

How to prevent from launching 2 jobs at the same time

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

Dian Fu

Re: How to prevent from launching 2 jobs at the same time

Hi David,

The jobs are identified by job id, not by job name internally in Flink and so It will only check if there are two jobs with the same job id.

If you submit the job via CLI[1], I'm afraid there are still no built-in ways provided as currently the job id is generated randomly when submitting a job via CLI and the generated job id has nothing to do with the job name.

However, if you submit the job via REST API [2], it did provide an option to specify the job id when submitting a job. You can generate the job id by yourself.

Regards,

Dian

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html

[2] https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run

在 2019年9月23日，上午4:57，David Morin <[hidden email]> 写道：

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

tison

Re: How to prevent from launching 2 jobs at the same time

The situation is as Dian said. Flink identifies jobs by job id instead of job name.

However, I think it is still a valid question if it is an alternative Flink identifies jobs by job name and

leaves the work to distinguish jobs by name to users. The advantages in this way includes a readable

display and interaction, as well as reduce some hardcode works on job id, such as we always set

job id to new JobID(0, 0) in standalone per-job mode for getting the same ZK path.

Best,

tison.

Dian Fu <[hidden email]> 于2019年9月23日周一上午10:55写道：

Hi David,

The jobs are identified by job id, not by job name internally in Flink and so It will only check if there are two jobs with the same job id.

If you submit the job via CLI[1], I'm afraid there are still no built-in ways provided as currently the job id is generated randomly when submitting a job via CLI and the generated job id has nothing to do with the job name.
However, if you submit the job via REST API [2], it did provide an option to specify the job id when submitting a job. You can generate the job id by yourself.

Regards,
Dian

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
[2] https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run

在 2019年9月23日，上午4:57，David Morin <[hidden email]> 写道：

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

aldu29

Re: How to prevent from launching 2 jobs at the same time

Hi,

Thanks for your replies.
Yes, it could be useful to have a way to define jobid. Thus, I would have been able to define the jbid based on the name for example. At the moment we do not use the REST API but the cli to submit our jobs on Yarn.
Nevertheless, I can implement a little trick: at startup query the Rest API and throw an Exception if a job with the same same is running.
Question: is there a way to retrieve the Job manager uri from my code or should I provide it as parameter ?
thx.
David

On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:

> The situation is as Dian said. Flink identifies jobs by job id instead of
> job name.
>
> However, I think it is still a valid question if it is an alternative Flink
> identifies jobs by job name and
> leaves the work to distinguish jobs by name to users. The advantages in
> this way includes a readable
> display and interaction, as well as reduce some hardcode works on job id,
> such as we always set
> job id to new JobID(0, 0) in standalone per-job mode for getting the same
> ZK path.
>
> Best,
> tison.
>
>
> Dian Fu <[hidden email]> 于2019年9月23日周一上午10:55写道：
>
> > Hi David,
> >
> > The jobs are identified by job id, not by job name internally in Flink and
> > so It will only check if there are two jobs with the same job id.
> >
> > If you submit the job via CLI[1], I'm afraid there are still no built-in
> > ways provided as currently the job id is generated randomly when submitting
> > a job via CLI and the generated job id has nothing to do with the job name.
> > However, if you submit the job via REST API [2], it did provide an option
> > to specify the job id when submitting a job. You can generate the job id by
> > yourself.
> >
> > Regards,
> > Dian
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> >
> > 在 2019年9月23日，上午4:57，David Morin <[hidden email]> 写道：
> >
> > Hi,
> >
> > What is the best way to prevent from launching 2 jobs with the same name
> > concurrently ?
> > Instead of doing a check in the script that starts the Flink job, I would
> > prefer to stop a job if another one with the same name is in progress
> > (Exception or something like that).
> >
> > David
> >
> >
> >
>

Till Rohrmann

Re: How to prevent from launching 2 jobs at the same time

Hi David,

you could use Flink's RestClusterClient and call #listJobs to obtain the list of jobs being executed on the cluster (note that it will also report finished jobs). By providing a properly configured Configuration (e.g. loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will automatically detect where the JobManager is running (e.g. via ZooKeeper if HA is enabled or it picks up the configured JobManager address from the configuration).

Of course, you could also provide the JobManager address as a parameter.

Cheers,

Till

On Mon, Sep 23, 2019 at 9:08 AM David Morin <[hidden email]> wrote:

Hi,

Thanks for your replies.
Yes, it could be useful to have a way to define jobid. Thus, I would have been able to define the jbid based on the name for example. At the moment we do not use the REST API but the cli to submit our jobs on Yarn.
Nevertheless, I can implement a little trick: at startup query the Rest API and throw an Exception if a job with the same same is running.
Question: is there a way to retrieve the Job manager uri from my code or should I provide it as parameter ?
thx.
David

On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:
> The situation is as Dian said. Flink identifies jobs by job id instead of
> job name.
>
> However, I think it is still a valid question if it is an alternative Flink
> identifies jobs by job name and
> leaves the work to distinguish jobs by name to users. The advantages in
> this way includes a readable
> display and interaction, as well as reduce some hardcode works on job id,
> such as we always set
> job id to new JobID(0, 0) in standalone per-job mode for getting the same
> ZK path.
>
> Best,
> tison.
>
>
> Dian Fu <[hidden email]> 于2019年9月23日周一上午10:55写道：
>
> > Hi David,
> >
> > The jobs are identified by job id, not by job name internally in Flink and
> > so It will only check if there are two jobs with the same job id.
> >
> > If you submit the job via CLI[1], I'm afraid there are still no built-in
> > ways provided as currently the job id is generated randomly when submitting
> > a job via CLI and the generated job id has nothing to do with the job name.
> > However, if you submit the job via REST API [2], it did provide an option
> > to specify the job id when submitting a job. You can generate the job id by
> > yourself.
> >
> > Regards,
> > Dian
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> >
> > 在 2019年9月23日，上午4:57，David Morin <[hidden email]> 写道：
> >
> > Hi,
> >
> > What is the best way to prevent from launching 2 jobs with the same name
> > concurrently ?
> > Instead of doing a check in the script that starts the Flink job, I would
> > prefer to stop a job if another one with the same name is in progress
> > (Exception or something like that).
> >
> > David
> >
> >
> >
>

aldu29

Re: How to prevent from launching 2 jobs at the same time

Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <[hidden email]> wrote:

> Hi David,
>
> you could use Flink's RestClusterClient and call #listJobs to obtain the
> list of jobs being executed on the cluster (note that it will also report
> finished jobs). By providing a properly configured Configuration (e.g.
> loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
> automatically detect where the JobManager is running (e.g. via ZooKeeper if
> HA is enabled or it picks up the configured JobManager address from the
> configuration).
>
> Of course, you could also provide the JobManager address as a parameter.
>
> Cheers,
> Till
>
> On Mon, Sep 23, 2019 at 9:08 AM David Morin <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Thanks for your replies.
> > Yes, it could be useful to have a way to define jobid. Thus, I would have
> > been able to define the jbid based on the name for example. At the moment
> > we do not use the REST API but the cli to submit our jobs on Yarn.
> > Nevertheless, I can implement a little trick: at startup query the Rest
> > API and throw an Exception if a job with the same same is running.
> > Question: is there a way to retrieve the Job manager uri from my code or
> > should I provide it as parameter ?
> > thx.
> > David
> >
> > On 2019/09/23 03:09:42, Zili Chen <[hidden email]> wrote:
> > > The situation is as Dian said. Flink identifies jobs by job id instead of
> > > job name.
> > >
> > > However, I think it is still a valid question if it is an alternative
> > Flink
> > > identifies jobs by job name and
> > > leaves the work to distinguish jobs by name to users. The advantages in
> > > this way includes a readable
> > > display and interaction, as well as reduce some hardcode works on job id,
> > > such as we always set
> > > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > > ZK path.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Dian Fu <[hidden email]> 于2019年9月23日周一上午10:55写道：
> > >
> > > > Hi David,
> > > >
> > > > The jobs are identified by job id, not by job name internally in Flink
> > and
> > > > so It will only check if there are two jobs with the same job id.
> > > >
> > > > If you submit the job via CLI[1], I'm afraid there are still no
> > built-in
> > > > ways provided as currently the job id is generated randomly when
> > submitting
> > > > a job via CLI and the generated job id has nothing to do with the job
> > name.
> > > > However, if you submit the job via REST API [2], it did provide an
> > option
> > > > to specify the job id when submitting a job. You can generate the job
> > id by
> > > > yourself.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > > [2]
> > > >
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > > >
> > > > 在 2019年9月23日，上午4:57，David Morin <[hidden email]> 写道：
> > > >
> > > > Hi,
> > > >
> > > > What is the best way to prevent from launching 2 jobs with the same
> > name
> > > > concurrently ?
> > > > Instead of doing a check in the script that starts the Flink job, I
> > would
> > > > prefer to stop a job if another one with the same name is in progress
> > > > (Exception or something like that).
> > > >
> > > > David
> > > >
> > > >
> > > >
> > >
> >
>

Theo

Re: How to prevent from launching 2 jobs at the same time

My simple workaround for it: I start the applications always from the same machine via CLI and just make a file-system-lock around execution of the check-if-task-is-already-running and task-launching part. This of course is a possible single-point-of-failure to rely on one machine starting the jobs but works in my current environment.

Best regards
Theo

----- Ursprüngliche Mail -----
Von: "David Morin" <[hidden email]>
An: "user" <[hidden email]>
Gesendet: Montag, 23. September 2019 17:21:17
Betreff: Re: How to prevent from launching 2 jobs at the same time

Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <[hidden email]> wrote: