(DEPRECATED) Apache Flink User Mailing List archive.

Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

Classic

List

Threaded

6 messages Options

Hao Sun

Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

I am wondering if I can customize job_id for job cluster mode. Currently it is always 00000000000000000000000000000000. I am running multiple job clusters and sharing s3, it means checkpoints will be shared by different jobs as well e.g. 00000000000000000000000000000000/chk-64, how can I avoid this?

Thanks

Ufuk Celebi

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

Hey Hao Sun,

this has been changed recently [1] in order to properly support
failover in job cluster mode.

A workaround for you would be to add an application identifier to the
checkpoint path of each application, resulting in S3 paths like
application-XXXX/00...00/chk-64.

Is that a feasible solution?

As a side note: It was considered to keep the job ID fixed, but make
it configurable (e.g. by providing a --job-id argument) which would
also help to avoid this situation, but I'm not aware of any concrete
plans to move forward with that approach.

Best,

Ufuk

[1] https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10291
On Sun, Nov 4, 2018 at 3:39 AM Hao Sun <[hidden email]> wrote:
>
> I am wondering if I can customize job_id for job cluster mode. Currently it is always 00000000000000000000000000000000. I am running multiple job clusters and sharing s3, it means checkpoints will be shared by different jobs as well e.g. 00000000000000000000000000000000/chk-64, how can I avoid this?
>
> Thanks

Hao Sun

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

Thanks that also works. To avoid same issue with zookeeper, I assume I have to do the same trick?

On Sun, Nov 4, 2018, 03:34 Ufuk Celebi <[hidden email]> wrote:

Hey Hao Sun,

this has been changed recently [1] in order to properly support
failover in job cluster mode.

A workaround for you would be to add an application identifier to the
checkpoint path of each application, resulting in S3 paths like
application-XXXX/00...00/chk-64.

Is that a feasible solution?

As a side note: It was considered to keep the job ID fixed, but make
it configurable (e.g. by providing a --job-id argument) which would
also help to avoid this situation, but I'm not aware of any concrete
plans to move forward with that approach.

Best,

Ufuk

[1] https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10291

On Sun, Nov 4, 2018 at 3:39 AM Hao Sun <[hidden email]> wrote:
>
> I am wondering if I can customize job_id for job cluster mode. Currently it is always 00000000000000000000000000000000. I am running multiple job clusters and sharing s3, it means checkpoints will be shared by different jobs as well e.g. 00000000000000000000000000000000/chk-64, how can I avoid this?
>
> Thanks

vino yang

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

Hi Hao Sun,

When you use the Job Cluster mode, you should be sure to isolate the Zookeeper path for different jobs.

Ufuk is correct. We fixed the JobID for the purpose of finding JobGraph in failover.

In fact, FLINK-10291 should be combined with FLINK-10292[1].

To till,

I hope FLINK-10292 can be reviewed as soon as possible.

Thanks, vino.

[1]: https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10292

Hao Sun <[hidden email]> 于2018年11月5日周一上午5:34写道：

Thanks that also works. To avoid same issue with zookeeper, I assume I have to do the same trick?

On Sun, Nov 4, 2018, 03:34 Ufuk Celebi <[hidden email]> wrote:
Hey Hao Sun,

this has been changed recently [1] in order to properly support
failover in job cluster mode.

A workaround for you would be to add an application identifier to the
checkpoint path of each application, resulting in S3 paths like
application-XXXX/00...00/chk-64.

Is that a feasible solution?

As a side note: It was considered to keep the job ID fixed, but make
it configurable (e.g. by providing a --job-id argument) which would
also help to avoid this situation, but I'm not aware of any concrete
plans to move forward with that approach.

Best,

Ufuk

[1] https://issues.apache.org/jira/projects/FLINK/issues/FLINK-10291

On Sun, Nov 4, 2018 at 3:39 AM Hao Sun <[hidden email]> wrote:
>
> I am wondering if I can customize job_id for job cluster mode. Currently it is always 00000000000000000000000000000000. I am running multiple job clusters and sharing s3, it means checkpoints will be shared by different jobs as well e.g. 00000000000000000000000000000000/chk-64, how can I avoid this?
>
> Thanks

Ufuk Celebi

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

In reply to this post by Hao Sun

On Sun, Nov 4, 2018 at 10:34 PM Hao Sun <[hidden email]> wrote:
> Thanks that also works. To avoid same issue with zookeeper, I assume I have to do the same trick?

Yes, exactly. The following configuration [1] entry takes care of this:

high-availability.cluster-id: application-1

This will result in ZooKeeper entries as follows: /flink/application-1/[...].

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html#config-file-flink-confyaml

Hao Sun

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

Thanks all.

On Mon, Nov 5, 2018 at 2:05 AM Ufuk Celebi <[hidden email]> wrote:

On Sun, Nov 4, 2018 at 10:34 PM Hao Sun <[hidden email]> wrote:
> Thanks that also works. To avoid same issue with zookeeper, I assume I have to do the same trick?

Yes, exactly. The following configuration [1] entry takes care of this:

high-availability.cluster-id: application-1

This will result in ZooKeeper entries as follows: /flink/application-1/[...].

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html#config-file-flink-confyaml