How to send local files to a flink job on YARN

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to send local files to a flink job on YARN

Guy Harmach

Hi,

 

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.

I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.

Can someone give an example of how to send local files and how to read them in the job?

 

Thanks, Guy

 

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
Reply | Threaded
Open this post in threaded view
|

Re: How to send local files to a flink job on YARN

Jörn Franke
That does not sound like a good idea to put a configuration file on every node.

What about Zookeeper?

On 13. Jul 2017, at 17:10, Guy Harmach <[hidden email]> wrote:

Hi,

 

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.

I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.

Can someone give an example of how to send local files and how to read them in the job?

 

Thanks, Guy

 

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
Reply | Threaded
Open this post in threaded view
|

RE: How to send local files to a flink job on YARN

Guy Harmach

Hi,

 

Just to clarify my need, I want to send the file from local file system to the job entry point, read it in the main method, and according its content to build my sources, operations and sinks.

I assumed by the cli usage description for the yarnship flag that it is the equivalent to Spark’s  –files flag that is used to pass local files to the driver.

Any solution other than manually copying/deleting  the file to HDFS?

 

From: Jörn Franke [mailto:[hidden email]]
Sent: Thursday, July 13, 2017 6:36 PM
To: Guy Harmach <[hidden email]>
Cc: [hidden email]
Subject: Re: How to send local files to a flink job on YARN

 

That does not sound like a good idea to put a configuration file on every node.

 

What about Zookeeper?


On 13. Jul 2017, at 17:10, Guy Harmach <[hidden email]> wrote:

Hi,

 

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.

I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.

Can someone give an example of how to send local files and how to read them in the job?

 

Thanks, Guy

 

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
Reply | Threaded
Open this post in threaded view
|

Re: How to send local files to a flink job on YARN

Ted Yu
In reply to this post by Guy Harmach
I went back to commit 6e38eb8:
[FLINK-1436] [docs] update command line documentation

A search in the repo for "yarnship" ended up with no hit in the code (same with commit bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git repo).

Wondering whether it is supported.

On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <[hidden email]> wrote:

Hi,

 

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.

I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.

Can someone give an example of how to send local files and how to read them in the job?

 

Thanks, Guy

 

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,

Reply | Threaded
Open this post in threaded view
|

Re: How to send local files to a flink job on YARN

Aljoscha Krettek
There’s a bit of a misconception here: in Flink there is no “driver” as there is in spark and the entry point of your program (“main()”) is not executed on the cluster but in the “client”. The main method is only responsible for constructing a program graph, this is then shipped to the cluster and the client (or the “main()”) method can shut down at this point. In your concrete case, this means that the main() method is not executed in the YARN context, i.e. it does not have the files that you specified with the “—yarnship” command.

Regarding “—yarnship” in general, I have descended into the depths of the Flink YARN support and this is how it works:
FlinkYarnSessionCli is the piece of code that acts as entry point when specifying “-m yarn-cluster” at the command line. This is the place where the options are defined: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L138-L138. The options are not hardcoded but have a dynamic prefix, normally the short prefix is “y” and the long prefix is “yarn”. In there you see

shipPath = new Option(shortPrefix + "t", longPrefix + "ship", true, "Ship files in the specified directory (t for transfer)”);

This translates to having the -yt and —yarnship parameters.

As to how FlinkYarnSessionCli is used when specifying “-m yarn-cluster”, this happens here: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136. Essentially, a “CustomCommandLine” subclass is responsible for handling the user invocation and the subclasses can announce that they would like to handle the user command line based on certain settings. For example, FlinkYarnSessionCli will announce that it can handle a command line when the “-m yarn-cluster” option is present: https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493. The CliFrontend will loop though the list of registered CustomCommandLine instances and pick the first one that announces that it would like to handle a given invocation: https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174

This is very convoluted and I hope my explications somehow help.

Best,
Aljoscha

On 13. Jul 2017, at 18:02, Ted Yu <[hidden email]> wrote:

I went back to commit 6e38eb8:
[FLINK-1436] [docs] update command line documentation

A search in the repo for "yarnship" ended up with no hit in the code (same with commit bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git repo).

Wondering whether it is supported.

On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <[hidden email]> wrote:
Hi,

 

I’m running a flink job on YARN. I’d like to pass yaml configuration files to the job.

I tried to use the flink cli –yarnship flag to point to a directory containing the file, but wasn’t able to get it in the job.

Can someone give an example of how to send local files and how to read them in the job?

 

Thanks, Guy

 

This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer