[DISCUSS] Remove dependency shipping through nested jars during job submission.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Remove dependency shipping through nested jars during job submission.

Kostas Kloudas-5
Hi all,

I would like to bring the discussion in
https://issues.apache.org/jira/browse/FLINK-17745 to the dev mailing
list, just to hear the opinions of the community.

In a nutshell, in the early days of Flink, users could submit their
jobs as fat-jars that had a specific structure. More concretely, the
user could put the dependencies of the submitted job in a lib/ folder
within his/her jar and Flink would search within the user's jar for
such a folder, and if this existed, it would extract the nested jars,
ship them independently and add them to the classpath. Finally, it
would also ship the fat-jar itself so that the user-code is available
at the cluster (for details see [1]).

This way of submission was NOT documented anywhere and it has the
obvious shortcoming that the "nested" jars will be shipped twice. In
addition, it makes the codebase a bit more difficult to maintain, as
this constitutes another way of submitting stuff.

Given the above, I would like to propose to remove this codepath. But
given that there are users using the hidden feature, I would like to
discuss 1) how many such users exist, 2) how difficult it is for them
to "migrate" to a different way of submitting jobs, and 3) if the rest
of the community agrees on removing it.

I post this on both dev and user ML so that we have better coverage.

Looking forward to a fruitful discussion,
Kostas

[1] https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java#L222
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove dependency shipping through nested jars during job submission.

rmetzger0
Hi,
afaik, this feature was added because Hadoop MapReduce has it as well (https://blog.cloudera.com/how-to-include-third-party-libraries-in-your-map-reduce-job/, point 2.).

I don't remember having seen this anywhere in the wild. I believe it is a good idea to simplify our codebase here.
If there are concerns, then we could at least add a big WARN log message in Flink 1.11+ that this feature will be deprecated in the future.


On Wed, May 20, 2020 at 10:39 AM Kostas Kloudas <[hidden email]> wrote:
Hi all,

I would like to bring the discussion in
https://issues.apache.org/jira/browse/FLINK-17745 to the dev mailing
list, just to hear the opinions of the community.

In a nutshell, in the early days of Flink, users could submit their
jobs as fat-jars that had a specific structure. More concretely, the
user could put the dependencies of the submitted job in a lib/ folder
within his/her jar and Flink would search within the user's jar for
such a folder, and if this existed, it would extract the nested jars,
ship them independently and add them to the classpath. Finally, it
would also ship the fat-jar itself so that the user-code is available
at the cluster (for details see [1]).

This way of submission was NOT documented anywhere and it has the
obvious shortcoming that the "nested" jars will be shipped twice. In
addition, it makes the codebase a bit more difficult to maintain, as
this constitutes another way of submitting stuff.

Given the above, I would like to propose to remove this codepath. But
given that there are users using the hidden feature, I would like to
discuss 1) how many such users exist, 2) how difficult it is for them
to "migrate" to a different way of submitting jobs, and 3) if the rest
of the community agrees on removing it.

I post this on both dev and user ML so that we have better coverage.

Looking forward to a fruitful discussion,
Kostas

[1] https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java#L222