Flink on yarn use jar on hdfs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on yarn use jar on hdfs

ysn2233
Hi everyone!
I found that everytime I start a flink-yarn application, client will ship flink-uber jar and other dependencies to hdfs and start appMaster. Is there any approaches to locate flink-uber jar and other library jars on hdfs and let only configuration file being shipped. Therefore the yarn can start flink appMaster using jar on a fixed location in hdfs? Thank you very much!

Reply | Threaded
Open this post in threaded view
|

Re: Flink on yarn use jar on hdfs

Yang Wang
Hi Shengnan,

I think you mean to avoid uploading flink-dist jars in submission every time. 
I have created a JIRA[1] to use Yarn public cache to speed up the launch 
duration of JM and TM. After this feature merged, you could submit a flink job like below.

./bin/flink run -d -m yarn-cluster -p 20 -ysl hdfs:///flink/release/flink-1.9.0/lib examples/streaming/WindowJoin.jar

[1]. https://issues.apache.org/jira/browse/FLINK-13938

Shengnan YU <[hidden email]> 于2019年9月16日周一 下午2:24写道:
Hi everyone!
I found that everytime I start a flink-yarn application, client will ship flink-uber jar and other dependencies to hdfs and start appMaster. Is there any approaches to locate flink-uber jar and other library jars on hdfs and let only configuration file being shipped. Therefore the yarn can start flink appMaster using jar on a fixed location in hdfs? Thank you very much!

Reply | Threaded
Open this post in threaded view
|

Re: Flink on yarn use jar on hdfs

ysn2233

And could you please share your github account with me? I am interested to follow you to see how you achieve this feature? Thank you.
On 9/16/2019 14:44[hidden email] wrote:
Hi Shengnan,

I think you mean to avoid uploading flink-dist jars in submission every time. 
I have created a JIRA[1] to use Yarn public cache to speed up the launch 
duration of JM and TM. After this feature merged, you could submit a flink job like below.

./bin/flink run -d -m yarn-cluster -p 20 -ysl hdfs:///flink/release/flink-1.9.0/lib examples/streaming/WindowJoin.jar

[1]. https://issues.apache.org/jira/browse/FLINK-13938

Shengnan YU <[hidden email]> 于2019年9月16日周一 下午2:24写道:
Hi everyone!
I found that everytime I start a flink-yarn application, client will ship flink-uber jar and other dependencies to hdfs and start appMaster. Is there any approaches to locate flink-uber jar and other library jars on hdfs and let only configuration file being shipped. Therefore the yarn can start flink appMaster using jar on a fixed location in hdfs? Thank you very much!

Reply | Threaded
Open this post in threaded view
|

Re: Flink on yarn use jar on hdfs

Yang Wang
Hi shengnan,

Sorry for late. I will attach a pr to FLINK-13938 in this week.
If we specify the shared lib(-ysl), all the jars located in the lib directory of flink client will not be uploaded.
Instead, we will use the hdfs path to set the LocalResource of yarn.
And the visibility of LocalResource will be public. So that the distributed cache could be shared by all the containers
even different applications. We have used it in production and find that it could speed up both the jobmanager and taskmanger launch duration.

Thanks,
Yang

Shengnan YU <[hidden email]> 于2019年9月16日周一 下午3:07写道:

And could you please share your github account with me? I am interested to follow you to see how you achieve this feature? Thank you.
On 9/16/2019 14:44[hidden email] wrote:
Hi Shengnan,

I think you mean to avoid uploading flink-dist jars in submission every time. 
I have created a JIRA[1] to use Yarn public cache to speed up the launch 
duration of JM and TM. After this feature merged, you could submit a flink job like below.

./bin/flink run -d -m yarn-cluster -p 20 -ysl hdfs:///flink/release/flink-1.9.0/lib examples/streaming/WindowJoin.jar

[1]. https://issues.apache.org/jira/browse/FLINK-13938

Shengnan YU <[hidden email]> 于2019年9月16日周一 下午2:24写道:
Hi everyone!
I found that everytime I start a flink-yarn application, client will ship flink-uber jar and other dependencies to hdfs and start appMaster. Is there any approaches to locate flink-uber jar and other library jars on hdfs and let only configuration file being shipped. Therefore the yarn can start flink appMaster using jar on a fixed location in hdfs? Thank you very much!