The flink CLI documentation says that the -m option is to specify the job manager.
but the examples are passing in an execution target. I am quite confused by this. ./bin/flink run -m yarn-cluster \ ./examples/batch/WordCount.jar \ --input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out So what is it? I am trying to run Flink in EMR 6.1.0 but I have failed. It appears as though some of the command line parameters changed from version 1.10 to 1.11. For example, -yna is now -ynm. -e is now -t. But I am still confused by the -m option in both documentation. Can somebody please explain? |
Hi Marco,
I agree with you that the -m help message is misleading but I do not think it has changed between releases. You can specify the address of the jobmanager or, for example, you can put "-m yarn-cluster" and depending on your environment setup Flink will pick up a session cluster or will create a per-job cluster. This was always the case. For the -t and -e the change is that -e was deprecated (although still active) in favour of -t. But it still has the same meaning. Finally on how to run Flink on EMR, I am not an expert so I will pull in Till who may have some input. Cheers, Kostas On Mon, Nov 9, 2020 at 10:46 PM Marco Villalobos <[hidden email]> wrote: > > The flink CLI documentation says that the -m option is to specify the job manager. > > but the examples are passing in an execution target. I am quite confused by this. > > ./bin/flink run -m yarn-cluster \ > ./examples/batch/WordCount.jar \ > --input hdfs:///user/hamlet.txt --output hdfs:///user/wordcount_out > > > So what is it? > > I am trying to run Flink in EMR 6.1.0 but I have failed. > > It appears as though some of the command line parameters changed from version 1.10 to 1.11. > > For example, -yna is now -ynm. > > -e is now -t. > > But I am still confused by the -m option in both documentation. > > Can somebody please explain? > |
Hi Marco, as Klou said, -m yarn-cluster should try to deploy a Yarn per job cluster on your Yarn cluster. Could you maybe share a bit more details about what is going wrong? E.g. the cli logs could be helpful to pinpoint the problem. I've tested that both `bin/flink run -m yarn-cluster examples/streaming/WindowJoin.jar` as well as `bin/flink run -t yarn-per-job examples/streamingWindowJoin.jar` start a Flink per job cluster. What was -yna supposed to do? -ynm should set the custom name of the Yarn application. [hidden email] should we maybe improve the existing documentation to better reflect the usage of -t/--target? The CLI documentation [1] does not include a single example where we use the target option. Moreover, we could think about retiring -m yarn-cluster in favour of -t yarn-per-job. Moreover, should we somewhere document which `execution.target` are all supported? What do you think? On Tue, Nov 10, 2020 at 4:00 PM Kostas Kloudas <[hidden email]> wrote: Hi Marco, |
Hi Till,
Thank you for following up. We were trying to set up s3 file sinks, and rocksdb with s3 checkpointing. We upgraded to Flink 1.11 and attempt to run the job in EMR. On startup, the logs showed an error that the flink-conf.yaml could not be found. I tried to trouble shoot the command line parameters, but the documentation was confusing me very much. My co-worker fixed the issue. It turns out that hadoop configuration files in EMR were not to set to work with the s3a protocol out of the box. Once we placed the correct values in the Hadoop configuration file, everything worked. Marco A. Villalobos
|
Great to hear that you solved the problem! Cheers, Till On Fri, Nov 13, 2020 at 4:56 PM Marco Villalobos <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |