Hi forks, Recently, our customers ask for a feature configuring remote flink jar. I'd like to reach to you guys to see whether or not it is a general need. ATM Flink only supports configures local file as flink jar via `-yj` option. If we pass a HDFS file path, due to implementation detail it will fail with IllegalArgumentException. In the story we support configure remote flink jar, this limitation is eliminated. We also make use of YARN locality so that reducing uploading overhead, instead, asking YARN to localize the jar on AM container started. Besides, it possibly has overlap with FLINK-13938. I'd like to put the discussion on our mailing list first. Are you looking forward to such a feature? @Yang Wang: this feature is different from that we discussed offline, it only focuses on flink jar, not all ship files. Best, tison. |
Hi tison, Thanks for your starting this discussion. * For user customized flink-dist jar, it is an useful feature. Since it could avoid to upload the flink-dist jar every time. Especially in production environment, it could accelerate the submission process. * For the standard flink-dist jar, FLINK-13938[1] could solve the problem.Upload a official flink release binary to distributed storage(hdfs) first, and then all the submission could benefit from it. Users could also upload the customized flink-dist jar to accelerate their submission. If the flink-dist jar could be specified to a remote path, maybe the user jar have the same situation. tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
|
There is a related use case (not specific to HDFS) that I came across: It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system. Thomas On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote: Hi tison, |
I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.
On 11/19/2019 12:11,[hidden email] wrote:
|
Would that be a feature specific to Yarn? (and maybe standalone sessions) For containerized setups, and init container seems like a nice way to solve this. Also more flexible, when it comes to supporting authentication mechanisms for the target storage system, etc. On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
|
Thanks for your participation! @Yang: Great to hear. I'd like to know whether or not a remote flink jar path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink jar from shipping which possibly not works for the remote one. @Thomas: It inspires a lot URL becomes the unified representation of resource. I'm thinking of how to serve a unique process getting resource from URL which points to an artifact or distributed file system. @ouywl & Stephan: Yes this improvement can be migrated to environment like k8s, IIRC the k8s proposal already discussed about improvement using "init container" and other technologies. However, so far I regard it is an improvement different from one storage to another so that we achieve then individually. Best, tison. Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道:
|
Thanks @Tison for starting the discussion and sorry for joining so late. Yes, I think this is a very good idea. we already tweak the flink-yarn package internally to support something similar to what @Thomas mentioned: to support registering a Jar that has already uploaded to some DFS (needless to be the Yarn public cache discussed in FLINK-13938). The reason is that: we provide our internal packaged extension libraries for our customers. And we've seen good performance improvement in our YARN cluster during container localization phase after our customer switch to use pre-uploaded JARs instead of having to upload every time during deployment. Looking forward for this feature! -- Rong On Tue, Nov 19, 2019 at 10:19 PM tison <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |