(DEPRECATED) Apache Flink User Mailing List archive.

CLI help, documentation is confusing...

Classic

List

Threaded

5 messages Options

Marco Villalobos-2

CLI help, documentation is confusing...

The flink CLI documentation says that the -m option is to specify the job manager.

but the examples are passing in an execution target. I am quite confused by this.

./bin/flink run -m yarn-cluster \
./examples/batch/WordCount.jar \
--input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out

So what is it?

I am trying to run Flink in EMR 6.1.0 but I have failed.

It appears as though some of the command line parameters changed from version 1.10 to 1.11.

For example, -yna is now -ynm.

-e is now -t.

But I am still confused by the -m option in both documentation.

Can somebody please explain?

Kostas Kloudas-2

Re: CLI help, documentation is confusing...

Hi Marco,

I agree with you that the -m help message is misleading but I do not
think it has changed between releases.
You can specify the address of the jobmanager or, for example, you can
put "-m yarn-cluster" and depending on your environment setup Flink
will pick up a session cluster or will create a per-job cluster.
This was always the case.

For the -t and -e the change is that -e was deprecated (although still
active) in favour of -t. But it still has the same meaning.

Finally on how to run Flink on EMR, I am not an expert so I will pull
in Till who may have some input.

Cheers,
Kostas

On Mon, Nov 9, 2020 at 10:46 PM Marco Villalobos
<[hidden email]> wrote:

>
> The flink CLI documentation says that the -m option is to specify the job manager.
>
> but the examples are passing in an execution target. I am quite confused by this.
>
> ./bin/flink run -m yarn-cluster \
> ./examples/batch/WordCount.jar \
> --input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out
>
>
> So what is it?
>
> I am trying to run Flink in EMR 6.1.0 but I have failed.
>
> It appears as though some of the command line parameters changed from version 1.10 to 1.11.
>
> For example, -yna is now -ynm.
>
> -e is now -t.
>
> But I am still confused by the -m option in both documentation.
>
> Can somebody please explain?
>

Till Rohrmann

Re: CLI help, documentation is confusing...

Hi Marco,

as Klou said, -m yarn-cluster should try to deploy a Yarn per job cluster on your Yarn cluster. Could you maybe share a bit more details about what is going wrong? E.g. the cli logs could be helpful to pinpoint the problem.

I've tested that both `bin/flink run -m yarn-cluster examples/streaming/WindowJoin.jar` as well as `bin/flink run -t yarn-per-job examples/streamingWindowJoin.jar` start a Flink per job cluster.

What was -yna supposed to do? -ynm should set the custom name of the Yarn application.

[hidden email] should we maybe improve the existing documentation to better reflect the usage of -t/--target? The CLI documentation [1] does not include a single example where we use the target option. Moreover, we could think about retiring -m yarn-cluster in favour of -t yarn-per-job. Moreover, should we somewhere document which `execution.target` are all supported? What do you think?

[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#job-submission-examples

Cheers,

Till

On Tue, Nov 10, 2020 at 4:00 PM Kostas Kloudas <[hidden email]> wrote:

Hi Marco,

I agree with you that the -m help message is misleading but I do not
think it has changed between releases.
You can specify the address of the jobmanager or, for example, you can
put "-m yarn-cluster" and depending on your environment setup Flink
will pick up a session cluster or will create a per-job cluster.
This was always the case.

For the -t and -e the change is that -e was deprecated (although still
active) in favour of -t. But it still has the same meaning.

Finally on how to run Flink on EMR, I am not an expert so I will pull
in Till who may have some input.

Cheers,
Kostas

On Mon, Nov 9, 2020 at 10:46 PM Marco Villalobos
<[hidden email]> wrote:
>
> The flink CLI documentation says that the -m option is to specify the job manager.
>
> but the examples are passing in an execution target. I am quite confused by this.
>
> ./bin/flink run -m yarn-cluster \
> ./examples/batch/WordCount.jar \
> --input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out
>
>
> So what is it?
>
> I am trying to run Flink in EMR 6.1.0 but I have failed.
>
> It appears as though some of the command line parameters changed from version 1.10 to 1.11.
>
> For example, -yna is now -ynm.
>
> -e is now -t.
>
> But I am still confused by the -m option in both documentation.
>
> Can somebody please explain?
>

Marco Villalobos-2

Re: CLI help, documentation is confusing...

Hi Till,

Thank you for following up.

We were trying to set up s3 file sinks, and rocksdb with s3 checkpointing. We upgraded to Flink 1.11 and attempt to run the job in EMR.

On startup, the logs showed an error that the flink-conf.yaml could not be found. I tried to trouble shoot the command line parameters, but the documentation was confusing me very much.

My co-worker fixed the issue. It turns out that hadoop configuration files in EMR were not to set to work with the s3a protocol out of the box. Once we placed the correct values in the Hadoop configuration file, everything worked.

Marco A. Villalobos

On Nov 13, 2020, at 7:32 AM, Till Rohrmann <[hidden email]> wrote:

Hi Marco,

as Klou said, -m yarn-cluster should try to deploy a Yarn per job cluster on your Yarn cluster. Could you maybe share a bit more details about what is going wrong? E.g. the cli logs could be helpful to pinpoint the problem.

I've tested that both `bin/flink run -m yarn-cluster examples/streaming/WindowJoin.jar` as well as `bin/flink run -t yarn-per-job examples/streamingWindowJoin.jar` start a Flink per job cluster.

What was -yna supposed to do? -ynm should set the custom name of the Yarn application.

[hidden email] should we maybe improve the existing documentation to better reflect the usage of -t/--target? The CLI documentation [1] does not include a single example where we use the target option. Moreover, we could think about retiring -m yarn-cluster in favour of -t yarn-per-job. Moreover, should we somewhere document which `execution.target` are all supported? What do you think?

[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#job-submission-examples

Cheers,
Till

On Tue, Nov 10, 2020 at 4:00 PM Kostas Kloudas <[hidden email]> wrote:
Hi Marco,

I agree with you that the -m help message is misleading but I do not
think it has changed between releases.
You can specify the address of the jobmanager or, for example, you can
put "-m yarn-cluster" and depending on your environment setup Flink
will pick up a session cluster or will create a per-job cluster.
This was always the case.

For the -t and -e the change is that -e was deprecated (although still
active) in favour of -t. But it still has the same meaning.

Finally on how to run Flink on EMR, I am not an expert so I will pull
in Till who may have some input.

Cheers,
Kostas

On Mon, Nov 9, 2020 at 10:46 PM Marco Villalobos
<[hidden email]> wrote:
>
> The flink CLI documentation says that the -m option is to specify the job manager.
>
> but the examples are passing in an execution target. I am quite confused by this.
>
> ./bin/flink run -m yarn-cluster \
> ./examples/batch/WordCount.jar \
> --input <a href="hdfs:///user/hamlet.txt" class="">hdfs:///user/hamlet.txt --output <a href="hdfs:///user/wordcount_out" class="">hdfs:///user/wordcount_out
>
>
> So what is it?
>
> I am trying to run Flink in EMR 6.1.0 but I have failed.
>
> It appears as though some of the command line parameters changed from version 1.10 to 1.11.
>
> For example, -yna is now -ynm.
>
> -e is now -t.
>
> But I am still confused by the -m option in both documentation.
>
> Can somebody please explain?
>

Till Rohrmann

Re: CLI help, documentation is confusing...

Great to hear that you solved the problem!

Cheers,

Till

On Fri, Nov 13, 2020 at 4:56 PM Marco Villalobos <[hidden email]> wrote:

Hi Till,

Thank you for following up.

We were trying to set up s3 file sinks, and rocksdb with s3 checkpointing. We upgraded to Flink 1.11 and attempt to run the job in EMR.

On startup, the logs showed an error that the flink-conf.yaml could not be found. I tried to trouble shoot the command line parameters, but the documentation was confusing me very much.

My co-worker fixed the issue. It turns out that hadoop configuration files in EMR were not to set to work with the s3a protocol out of the box. Once we placed the correct values in the Hadoop configuration file, everything worked.

Marco A. Villalobos

On Nov 13, 2020, at 7:32 AM, Till Rohrmann <[hidden email]> wrote:

Hi Marco,

as Klou said, -m yarn-cluster should try to deploy a Yarn per job cluster on your Yarn cluster. Could you maybe share a bit more details about what is going wrong? E.g. the cli logs could be helpful to pinpoint the problem.

I've tested that both `bin/flink run -m yarn-cluster examples/streaming/WindowJoin.jar` as well as `bin/flink run -t yarn-per-job examples/streamingWindowJoin.jar` start a Flink per job cluster.

What was -yna supposed to do? -ynm should set the custom name of the Yarn application.

[hidden email] should we maybe improve the existing documentation to better reflect the usage of -t/--target? The CLI documentation [1] does not include a single example where we use the target option. Moreover, we could think about retiring -m yarn-cluster in favour of -t yarn-per-job. Moreover, should we somewhere document which `execution.target` are all supported? What do you think?

[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#job-submission-examples

Cheers,
Till

On Tue, Nov 10, 2020 at 4:00 PM Kostas Kloudas <[hidden email]> wrote:
Hi Marco,

I agree with you that the -m help message is misleading but I do not
think it has changed between releases.
You can specify the address of the jobmanager or, for example, you can
put "-m yarn-cluster" and depending on your environment setup Flink
will pick up a session cluster or will create a per-job cluster.
This was always the case.

For the -t and -e the change is that -e was deprecated (although still
active) in favour of -t. But it still has the same meaning.

Finally on how to run Flink on EMR, I am not an expert so I will pull
in Till who may have some input.

Cheers,
Kostas

On Mon, Nov 9, 2020 at 10:46 PM Marco Villalobos
<[hidden email]> wrote:
>
> The flink CLI documentation says that the -m option is to specify the job manager.
>
> but the examples are passing in an execution target. I am quite confused by this.
>
> ./bin/flink run -m yarn-cluster \
> ./examples/batch/WordCount.jar \
> --input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out
>
>
> So what is it?
>
> I am trying to run Flink in EMR 6.1.0 but I have failed.
>
> It appears as though some of the command line parameters changed from version 1.10 to 1.11.
>
> For example, -yna is now -ynm.
>
> -e is now -t.
>
> But I am still confused by the -m option in both documentation.
>
> Can somebody please explain?
>