[DISCUSS] Support configure remote flink jar

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Support configure remote flink jar

tison
Hi forks,

Recently, our customers ask for a feature configuring remote flink jar. I'd like to reach to you guys
to see whether or not it is a general need.

ATM Flink only supports configures local file as flink jar via `-yj` option. If we pass a HDFS file
path, due to implementation detail it will fail with IllegalArgumentException. In the story we support
configure remote flink jar, this limitation is eliminated. We also make use of YARN locality so that
reducing uploading overhead, instead, asking YARN to localize the jar on AM container started.

Besides, it possibly has overlap with FLINK-13938. I'd like to put the discussion on our
mailing list first.

Are you looking forward to such a feature?

@Yang Wang: this feature is different from that we discussed offline, it only focuses on flink jar, not
all ship files.

Best,
tison.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Yang Wang
Hi tison,

Thanks for your starting this discussion. 
* For user customized flink-dist jar, it is an useful feature. Since it could avoid to upload the flink-dist jar 
every time. Especially in production environment, it could accelerate the submission process. 
* For the standard flink-dist jar, FLINK-13938[1] could solve the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user jar have the same situation.


tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:
Hi forks,

Recently, our customers ask for a feature configuring remote flink jar. I'd like to reach to you guys
to see whether or not it is a general need.

ATM Flink only supports configures local file as flink jar via `-yj` option. If we pass a HDFS file
path, due to implementation detail it will fail with IllegalArgumentException. In the story we support
configure remote flink jar, this limitation is eliminated. We also make use of YARN locality so that
reducing uploading overhead, instead, asking YARN to localize the jar on AM container started.

Besides, it possibly has overlap with FLINK-13938. I'd like to put the discussion on our
mailing list first.

Are you looking forward to such a feature?

@Yang Wang: this feature is different from that we discussed offline, it only focuses on flink jar, not
all ship files.

Best,
tison.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Thomas Weise
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

ouywl
I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.


On 11/19/2019 12:11[hidden email] wrote:
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Stephan Ewen
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve this. Also more flexible, when it comes to supporting authentication mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.


On 11/19/2019 12:11[hidden email] wrote:
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

tison
Thanks for your participation!

@Yang: Great to hear. I'd like to know whether or not a remote flink jar path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink jar from shipping which possibly not works for the remote one.

@Thomas: It inspires a lot URL becomes the unified representation of resource. I'm thinking of how to serve a unique process getting resource from URL which points to an artifact or distributed file system.

@ouywl & Stephan: Yes this improvement can be migrated to environment like k8s, IIRC the k8s proposal already discussed about improvement using "init container" and other technologies. However, so far I regard it is an improvement different from one storage to another so that we achieve then individually.


Best,
tison.


Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道:
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve this. Also more flexible, when it comes to supporting authentication mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.


On 11/19/2019 12:11[hidden email] wrote:
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support configure remote flink jar

Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late.

Yes, I think this is a very good idea. we already tweak the flink-yarn package internally to support something similar to what @Thomas mentioned: to support registering a Jar that has already uploaded to some DFS (needless to be the Yarn public cache discussed in FLINK-13938). 
The reason is that: we provide our internal packaged extension libraries for our customers. And we've seen good performance improvement in our YARN cluster during container localization phase after our customer switch to use pre-uploaded JARs instead of having to upload every time during deployment.

Looking forward for this feature! 

--
Rong


On Tue, Nov 19, 2019 at 10:19 PM tison <[hidden email]> wrote:
Thanks for your participation!

@Yang: Great to hear. I'd like to know whether or not a remote flink jar path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink jar from shipping which possibly not works for the remote one.

@Thomas: It inspires a lot URL becomes the unified representation of resource. I'm thinking of how to serve a unique process getting resource from URL which points to an artifact or distributed file system.

@ouywl & Stephan: Yes this improvement can be migrated to environment like k8s, IIRC the k8s proposal already discussed about improvement using "init container" and other technologies. However, so far I regard it is an improvement different from one storage to another so that we achieve then individually.


Best,
tison.


Stephan Ewen <[hidden email]> 于2019年11月20日周三 上午12:34写道:
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve this. Also more flexible, when it comes to supporting authentication mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl <[hidden email]> wrote:
I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.


On 11/19/2019 12:11[hidden email] wrote:
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang <[hidden email]> wrote:
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison <[hidden email]> 于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>