Flink on YARN: Where to install Flink binaries?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on YARN: Where to install Flink binaries?

Piper Piper
Hello,

I have a YARN/Hadoop 2.7.6 cluster, on which I plan to run Flink in Job mode using:
Flink 1.9.1 (with Flink application programs written in Java)
Prebundled Hadoop 2.7.5

Question 1: Which scala version must I choose for the Flink 1.9.1 binary (2.11 or 2.12)? 

Secondly, I had read a document or mailing list question (which I have now lost access to), that the Flink binaries do not need to be installed on any of the YARN cluster nodes. Instead, the Flink binaries must only be installed on the client which submits the Flink job to the YARN cluster.

Question 2: Can someone please confirm and clarify the above point for me? What is this client?

1. Can the client be one of the YARN cluster nodes (NameNode, ResourceManager Node or Worker nodes)?

2. Can the client be a remote desktop (not a part of the YARN cluster)?

Question 3: How do I get the value used to set the YARN_CONF_DIR or HADOOP_CONF_DIR environment variable on a remote desktop client?

Thanks,

Piper
Reply | Threaded
Open this post in threaded view
|

Re: Flink on YARN: Where to install Flink binaries?

Till Rohrmann
Hi Piper,

Answer 1: You should pick the Scala version you are using in your user program. If you don't use Scala at all, then pick 2.11.
Answer 2: Flink does not need to be installed on the Yarn nodes. The client is the machine from which you start the Flink cluster. The client machine needs to have access to the Hadoop/Yarn cluster. Hence you should configure the HADOOP_CONF_DIR to the Hadoop configuration.

Cheers,
Till

On Wed, Dec 4, 2019 at 11:04 AM Piper Piper <[hidden email]> wrote:
Hello,

I have a YARN/Hadoop 2.7.6 cluster, on which I plan to run Flink in Job mode using:
Flink 1.9.1 (with Flink application programs written in Java)
Prebundled Hadoop 2.7.5

Question 1: Which scala version must I choose for the Flink 1.9.1 binary (2.11 or 2.12)? 

Secondly, I had read a document or mailing list question (which I have now lost access to), that the Flink binaries do not need to be installed on any of the YARN cluster nodes. Instead, the Flink binaries must only be installed on the client which submits the Flink job to the YARN cluster.

Question 2: Can someone please confirm and clarify the above point for me? What is this client?

1. Can the client be one of the YARN cluster nodes (NameNode, ResourceManager Node or Worker nodes)?

2. Can the client be a remote desktop (not a part of the YARN cluster)?

Question 3: How do I get the value used to set the YARN_CONF_DIR or HADOOP_CONF_DIR environment variable on a remote desktop client?

Thanks,

Piper
Reply | Threaded
Open this post in threaded view
|

Re: Flink on YARN: Where to install Flink binaries?

Piper Piper
Thank you, Till!

On Wed, Dec 4, 2019, 5:51 AM Till Rohrmann <[hidden email]> wrote:
Hi Piper,

Answer 1: You should pick the Scala version you are using in your user program. If you don't use Scala at all, then pick 2.11.
Answer 2: Flink does not need to be installed on the Yarn nodes. The client is the machine from which you start the Flink cluster. The client machine needs to have access to the Hadoop/Yarn cluster. Hence you should configure the HADOOP_CONF_DIR to the Hadoop configuration.

Cheers,
Till

On Wed, Dec 4, 2019 at 11:04 AM Piper Piper <[hidden email]> wrote:
Hello,

I have a YARN/Hadoop 2.7.6 cluster, on which I plan to run Flink in Job mode using:
Flink 1.9.1 (with Flink application programs written in Java)
Prebundled Hadoop 2.7.5

Question 1: Which scala version must I choose for the Flink 1.9.1 binary (2.11 or 2.12)? 

Secondly, I had read a document or mailing list question (which I have now lost access to), that the Flink binaries do not need to be installed on any of the YARN cluster nodes. Instead, the Flink binaries must only be installed on the client which submits the Flink job to the YARN cluster.

Question 2: Can someone please confirm and clarify the above point for me? What is this client?

1. Can the client be one of the YARN cluster nodes (NameNode, ResourceManager Node or Worker nodes)?

2. Can the client be a remote desktop (not a part of the YARN cluster)?

Question 3: How do I get the value used to set the YARN_CONF_DIR or HADOOP_CONF_DIR environment variable on a remote desktop client?

Thanks,

Piper