(DEPRECATED) Apache Flink User Mailing List archive.

Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

Classic

List

Threaded

2 messages Options

kedar mhaswade

Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

Typically, when one wants to run a Flink job on a Hadoop YARN installation, one creates a Yarn session (e.g. ./bin/yarn-session.sh -n 4 -qu test-yarn-queue) and runs intended Flink job(s) (e.g. ./bin/flink run -c MyFlinkApp -m job-manager-host:job-manager-port <overriding app config params> myapp.jar) on the Flink cluster whose job manager URL is returned by the previous command.

My questions are:

- Does yarn-session.sh need conf/flink-conf.yaml to be available in Flink installation on every container in YARN? If this file is needed, how can one run different YARN sessions (with potentially very different configurations) on the same Hadoop YARN installation simultaneously?

- Is it possible to start the YARN session programmatically? If yes, I believe I should look at classes like YarnClusterClient. Is that right? Is there any other guidance on how to do this programmatically (e.g. I have a management UI that wants to start/stop YARN sessions and deploy Flink jobs to it)?

Regards,

Kedar

Chesnay Schepler

Re: Programmatic creation of YARN sessions and deployment (running) Flink jobs on it.

Hello,

I think the flink-conf.yaml should only be required on the node on which you call yarn-session.sh.

For starting the session cluster programmatically you would have to look into the YarnClusterDescriptor (for starting the session cluster) and the YarnClusterClient for submitting jobs (but you get a client from the cluster descriptor).
Do note however that these are internal API's; they may or may not be documented, they may rely on specific behavior of the CLI and there are no API stability guarantees.

The YARNSessionFIFOITCase may provide some hints on how to use it.

On 27.03.2018 01:32, kedar mhaswade wrote:

Typically, when one wants to run a Flink job on a Hadoop YARN installation, one creates a Yarn session (e.g. ./bin/yarn-session.sh -n 4 -qu test-yarn-queue) and runs intended Flink job(s) (e.g. ./bin/flink run -c MyFlinkApp -m job-manager-host:job-manager-port <overriding app config params> myapp.jar) on the Flink cluster whose job manager URL is returned by the previous command.

My questions are:

- Does yarn-session.sh need conf/flink-conf.yaml to be available in Flink installation on every container in YARN? If this file is needed, how can one run different YARN sessions (with potentially very different configurations) on the same Hadoop YARN installation simultaneously?

- Is it possible to start the YARN session programmatically? If yes, I believe I should look at classes like YarnClusterClient. Is that right? Is there any other guidance on how to do this programmatically (e.g. I have a management UI that wants to start/stop YARN sessions and deploy Flink jobs to it)?

Regards,

Kedar