[DISCUSS] Towards a leaner flink-dist

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Towards a leaner flink-dist

Chesnay Schepler
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Fabian Hueske-2
Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <[hidden email]>:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Jamie Grier-2
I'm not sure if this is required.  It's quite convenient to be able to just grab a single tarball and you've got everything you need.

I just did this for the latest binary release and it was 273MB and took about 25 seconds to download.  Of course I know connection speeds vary quite a bit but I don't think 273 MB seems onerous to download and I like the simplicity of it the way it is.



On Fri, Jan 18, 2019 at 3:34 AM Fabian Hueske <[hidden email]> wrote:
Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <[hidden email]>:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Jeff Zhang
In reply to this post by Fabian Hueske-2
Thanks Chesnay for raising this discussion thread.  I think there are 3 major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell, sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder, and I think the filesystem components and metrics components could be moved out. Because they are pluggable components and is only used in scenario 1 I think (setting up standalone cluster). Other components like flink-table, flink-ml, flnk-gellay, we should still keep them IMHO, because new user may still use it to try the features of flink. For me, scala-shell is the first option to try new features of flink. 



Fabian Hueske <[hidden email]> 于2019年1月18日周五 下午7:34写道:
Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <[hidden email]>:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay



--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Stephan Ewen
There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to Flink, which makes the "include all" approach not completely feasible in long run:

  - Connectors: For a proper experience with the Shell/CLI (for example for SQL) we need a lot of fat connector jars.
    These come often for multiple versions, which alone accounts for 100s of MBs of connector jars.
  - The pre-bundled FileSystems are also on the verge of adding 100s of MBs themselves.
  - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
  - the core flink libraries (core, apis, runtime, etc.)
  - yarn / mesos  etc. adapters
  - examples (the examples should be a small set of self-contained programs without additional dependencies)
  - default logging
  - default metric reporter (jmx)
  - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be offered for individual download)
  - Hadoop libs
  - the pre-shaded file systems
  - the pre-packaged SQL connectors
  - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <[hidden email]> wrote:
Thanks Chesnay for raising this discussion thread.  I think there are 3 major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell, sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder, and I think the filesystem components and metrics components could be moved out. Because they are pluggable components and is only used in scenario 1 I think (setting up standalone cluster). Other components like flink-table, flink-ml, flnk-gellay, we should still keep them IMHO, because new user may still use it to try the features of flink. For me, scala-shell is the first option to try new features of flink. 



Fabian Hueske <[hidden email]> 于2019年1月18日周五 下午7:34写道:
Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and connectors (although in application space).

+1

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <[hidden email]>:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay



--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Ufuk Celebi
I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <[hidden email]> wrote:

>
> There are some points where a leaner approach could help.
> There are many libraries and connectors that are currently being adding to
> Flink, which makes the "include all" approach not completely feasible in
> long run:
>
>   - Connectors: For a proper experience with the Shell/CLI (for example for
> SQL) we need a lot of fat connector jars.
>     These come often for multiple versions, which alone accounts for 100s
> of MBs of connector jars.
>   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
> themselves.
>   - The metric reporters are bit by bit growing as well.
>
> The following could be a compromise:
>
> The flink-dist would include
>   - the core flink libraries (core, apis, runtime, etc.)
>   - yarn / mesos  etc. adapters
>   - examples (the examples should be a small set of self-contained programs
> without additional dependencies)
>   - default logging
>   - default metric reporter (jmx)
>   - shells (scala, sql)
>
> The flink-dist would NOT include the following libs (and these would be
> offered for individual download)
>   - Hadoop libs
>   - the pre-shaded file systems
>   - the pre-packaged SQL connectors
>   - additional metric reporters
>
>
> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <[hidden email]> wrote:
>
> > Thanks Chesnay for raising this discussion thread.  I think there are 3
> > major use scenarios for flink binary distribution.
> >
> > 1. Use it to set up standalone cluster
> > 2. Use it to experience features of flink, such as via scala-shell,
> > sql-client
> > 3. Downstream project use it to integrate with their system
> >
> > I did a size estimation of flink dist folder, lib folder take around 100M
> > and opt folder take around 200M. Overall I agree to make a thin flink dist.
> > So the next problem is which components to drop. I check the opt folder,
> > and I think the filesystem components and metrics components could be moved
> > out. Because they are pluggable components and is only used in scenario 1 I
> > think (setting up standalone cluster). Other components like flink-table,
> > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> > still use it to try the features of flink. For me, scala-shell is the first
> > option to try new features of flink.
> >
> >
> >
> > Fabian Hueske <[hidden email]> 于2019年1月18日周五 下午7:34写道:
> >
> >> Hi Chesnay,
> >>
> >> Thank you for the proposal.
> >> I think this is a good idea.
> >> We follow a similar approach already for Hadoop dependencies and
> >> connectors (although in application space).
> >>
> >> +1
> >>
> >> Fabian
> >>
> >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> >> [hidden email]>:
> >>
> >>> Hello,
> >>>
> >>> the binary distribution that we release by now contains quite a lot of
> >>> optional components, including various filesystems, metric reporters and
> >>> libraries. Most users will only use a fraction of these, and as such
> >>> pretty much only increase the size of flink-dist.
> >>>
> >>> With Flink growing more and more in scope I don't believe it to be
> >>> feasible to ship everything we have with every distribution, and instead
> >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> >>> lean and additional components are downloaded separately and added by
> >>> the user.
> >>>
> >>> This would primarily affect the /opt directory, but could also be
> >>> extended to cover flink-dist. For example, the yarn and mesos code could
> >>> be spliced out into separate jars that could be added to lib manually.
> >>>
> >>> Let me know what you think.
> >>>
> >>> Regards,
> >>>
> >>> Chesnay
> >>>
> >>>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Timo Walther
+1 for Stephan's suggestion. For example, SQL connectors have never been
part of the main distribution and nobody complained about this so far. I
think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].

[1] https://flink.apache.org/downloads.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies

Regards,
Timo


Am 23.01.19 um 10:07 schrieb Ufuk Celebi:

> I like the idea of a leaner binary distribution. At the same time I
> agree with Jamie that the current binary is quite convenient and
> connection speeds should not be that big of a deal. Since the binary
> distribution is one of the first entry points for users, I'd like to
> keep it as user-friendly as possible.
>
> What do you think about building a lean distribution by default and a
> "full" distribution that still bundles all the optional dependencies
> for releases? (If you don't think that's feasible I'm still +1 to only
> go with the "lean dist" approach.)
>
> – Ufuk
>
> On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <[hidden email]> wrote:
>> There are some points where a leaner approach could help.
>> There are many libraries and connectors that are currently being adding to
>> Flink, which makes the "include all" approach not completely feasible in
>> long run:
>>
>>    - Connectors: For a proper experience with the Shell/CLI (for example for
>> SQL) we need a lot of fat connector jars.
>>      These come often for multiple versions, which alone accounts for 100s
>> of MBs of connector jars.
>>    - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
>> themselves.
>>    - The metric reporters are bit by bit growing as well.
>>
>> The following could be a compromise:
>>
>> The flink-dist would include
>>    - the core flink libraries (core, apis, runtime, etc.)
>>    - yarn / mesos  etc. adapters
>>    - examples (the examples should be a small set of self-contained programs
>> without additional dependencies)
>>    - default logging
>>    - default metric reporter (jmx)
>>    - shells (scala, sql)
>>
>> The flink-dist would NOT include the following libs (and these would be
>> offered for individual download)
>>    - Hadoop libs
>>    - the pre-shaded file systems
>>    - the pre-packaged SQL connectors
>>    - additional metric reporters
>>
>>
>> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <[hidden email]> wrote:
>>
>>> Thanks Chesnay for raising this discussion thread.  I think there are 3
>>> major use scenarios for flink binary distribution.
>>>
>>> 1. Use it to set up standalone cluster
>>> 2. Use it to experience features of flink, such as via scala-shell,
>>> sql-client
>>> 3. Downstream project use it to integrate with their system
>>>
>>> I did a size estimation of flink dist folder, lib folder take around 100M
>>> and opt folder take around 200M. Overall I agree to make a thin flink dist.
>>> So the next problem is which components to drop. I check the opt folder,
>>> and I think the filesystem components and metrics components could be moved
>>> out. Because they are pluggable components and is only used in scenario 1 I
>>> think (setting up standalone cluster). Other components like flink-table,
>>> flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
>>> still use it to try the features of flink. For me, scala-shell is the first
>>> option to try new features of flink.
>>>
>>>
>>>
>>> Fabian Hueske <[hidden email]> 于2019年1月18日周五 下午7:34写道:
>>>
>>>> Hi Chesnay,
>>>>
>>>> Thank you for the proposal.
>>>> I think this is a good idea.
>>>> We follow a similar approach already for Hadoop dependencies and
>>>> connectors (although in application space).
>>>>
>>>> +1
>>>>
>>>> Fabian
>>>>
>>>> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
>>>> [hidden email]>:
>>>>
>>>>> Hello,
>>>>>
>>>>> the binary distribution that we release by now contains quite a lot of
>>>>> optional components, including various filesystems, metric reporters and
>>>>> libraries. Most users will only use a fraction of these, and as such
>>>>> pretty much only increase the size of flink-dist.
>>>>>
>>>>> With Flink growing more and more in scope I don't believe it to be
>>>>> feasible to ship everything we have with every distribution, and instead
>>>>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>>>>> lean and additional components are downloaded separately and added by
>>>>> the user.
>>>>>
>>>>> This would primarily affect the /opt directory, but could also be
>>>>> extended to cover flink-dist. For example, the yarn and mesos code could
>>>>> be spliced out into separate jars that could be added to lib manually.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Chesnay
>>>>>
>>>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Ufuk Celebi
In reply to this post by Ufuk Celebi
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]> wrote:
> I think what is more important than a big dist bundle is a helpful
> "Downloads" page where users can easily find available filesystems,
> connectors, metric repoters. Not everyone checks Maven central for
> available JAR files. I just saw that we added a "Optional components"
> section recently [1], we just need to make it more prominent. This is
> also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Till Rohrmann
Ufuk's proposal (having a lean default release and a user convenience tarball) sounds good to me. That way advanced users won't be bothered by an unnecessarily large release and new users can benefit from having many useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi <[hidden email]> wrote:
On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]> wrote:
> I think what is more important than a big dist bundle is a helpful
> "Downloads" page where users can easily find available filesystems,
> connectors, metric repoters. Not everyone checks Maven central for
> available JAR files. I just saw that we added a "Optional components"
> section recently [1], we just need to make it more prominent. This is
> also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Thomas Weise
+1 for trimming the size by default and offering the fat distribution as alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann <[hidden email]> wrote:
Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi <[hidden email]> wrote:

> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]> wrote:
> > I think what is more important than a big dist bundle is a helpful
> > "Downloads" page where users can easily find available filesystems,
> > connectors, metric repoters. Not everyone checks Maven central for
> > available JAR files. I just saw that we added a "Optional components"
> > section recently [1], we just need to make it more prominent. This is
> > also done for the SQL connectors and formats [2].
>
> +1 I fully agree with the importance of the Downloads page. We
> definitely need to make any optional dependencies that users need to
> download easy to find.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

phoenixjiangnan
+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the leaner one. If we support both, I'd image release managers should be able to package two distributions with a single change of parameter instead of manually package the full distribution. How to achieve that needs to be evaluated and discussed, probably can be something like 'mvn clean install -Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise <[hidden email]> wrote:
+1 for trimming the size by default and offering the fat distribution as alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann <[hidden email]> wrote:
Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi <[hidden email]> wrote:

> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]> wrote:
> > I think what is more important than a big dist bundle is a helpful
> > "Downloads" page where users can easily find available filesystems,
> > connectors, metric repoters. Not everyone checks Maven central for
> > available JAR files. I just saw that we added a "Optional components"
> > section recently [1], we just need to make it more prominent. This is
> > also done for the SQL connectors and formats [2].
>
> +1 I fully agree with the importance of the Downloads page. We
> definitely need to make any optional dependencies that users need to
> download easy to find.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Jark Wu-3
+1 for the leaner distribution and improve the "Download" page.

On Fri, 25 Jan 2019 at 01:54, Bowen Li <[hidden email]> wrote:
+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the leaner one. If we support both, I'd image release managers should be able to package two distributions with a single change of parameter instead of manually package the full distribution. How to achieve that needs to be evaluated and discussed, probably can be something like 'mvn clean install -Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise <[hidden email]> wrote:
+1 for trimming the size by default and offering the fat distribution as alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann <[hidden email]> wrote:
Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi <[hidden email]> wrote:

> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]> wrote:
> > I think what is more important than a big dist bundle is a helpful
> > "Downloads" page where users can easily find available filesystems,
> > connectors, metric repoters. Not everyone checks Maven central for
> > available JAR files. I just saw that we added a "Optional components"
> > section recently [1], we just need to make it more prominent. This is
> > also done for the SQL connectors and formats [2].
>
> +1 I fully agree with the importance of the Downloads page. We
> definitely need to make any optional dependencies that users need to
> download easy to find.
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

jincheng sun
In reply to this post by Chesnay Schepler
Hi Chesnay,

Thank you for the proposal. And i like it very much.

+1 for the leaner distribution.

About improve the "Download" page, I think we can add the connectors download link in the  "Optional components" section which [hidden email]  mentioned above.


Regards,
Jincheng

Chesnay Schepler <[hidden email]> 于2019年1月18日周五 下午5:59写道:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Hequn Cheng
Hi Chesnay,

Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the "Download" page.
 I think a leaner flink-dist would be very helpful. If we bundle all jars into a single one, this will easily cause class conflict problem. 

Best,
Hequn


On Fri, Jan 25, 2019 at 2:48 PM jincheng sun <[hidden email]> wrote:
Hi Chesnay,

Thank you for the proposal. And i like it very much.

+1 for the leaner distribution.

About improve the "Download" page, I think we can add the connectors download link in the  "Optional components" section which [hidden email]  mentioned above.


Regards,
Jincheng

Chesnay Schepler <[hidden email]> 于2019年1月18日周五 下午5:59写道:
Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Becket Qin
Hi Chesnay,

Thanks for the proposal. +1 for make the distribution thinner. 

Meanwhile, it would be useful to have all the peripheral libraries/jars hosted somewhere so users can download them from a centralized place. We can also encourage the community to contribute their libraries, such as connectors and other pluggables, to the same place (maybe a separate category), so the community can share the commonly used libraries as well.

Thanks,

Jiangjie (Becket) Qin

On Sat, Jan 26, 2019 at 2:49 PM Hequn Cheng <[hidden email]> wrote:
Hi Chesnay,

Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the
"Download" page.
 I think a leaner flink-dist would be very helpful. If we bundle all jars
into a single one, this will easily cause class conflict problem.

Best,
Hequn


On Fri, Jan 25, 2019 at 2:48 PM jincheng sun <[hidden email]>
wrote:

> Hi Chesnay,
>
> Thank you for the proposal. And i like it very much.
>
> +1 for the leaner distribution.
>
> About improve the "Download" page, I think we can add the connectors
> download link in the  "Optional components" section which @Timo Walther
> <[hidden email]>  mentioned above.
>
>
> Regards,
> Jincheng
>
> Chesnay Schepler <[hidden email]> 于2019年1月18日周五 下午5:59写道:
>
>> Hello,
>>
>> the binary distribution that we release by now contains quite a lot of
>> optional components, including various filesystems, metric reporters and
>> libraries. Most users will only use a fraction of these, and as such
>> pretty much only increase the size of flink-dist.
>>
>> With Flink growing more and more in scope I don't believe it to be
>> feasible to ship everything we have with every distribution, and instead
>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>> lean and additional components are downloaded separately and added by
>> the user.
>>
>> This would primarily affect the /opt directory, but could also be
>> extended to cover flink-dist. For example, the yarn and mesos code could
>> be spliced out into separate jars that could be added to lib manually.
>>
>> Let me know what you think.
>>
>> Regards,
>>
>> Chesnay
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Towards a leaner flink-dist

Chesnay Schepler
In reply to this post by Jark Wu-3
It is not viable for us, as of right now, to release both a lean and fat
version of flink-dist.
We don't have the required tooling to assemble a correct NOTICE file for
that scenario.

Besides that his would also go against recent efforts to reduce the
total size of a Flink release,
as we'd be increasing the total size again by roughly 60% (and naturally
also increase the compile
time of releases), which I'd like to avoid.

I like Stephans compromise of excluding reporters and file-systems; this
removes more than 100mb
from the distribution yet still retains all the user-facing APIs.

Do note that hadoop will already not be included in convenience binaries
for 1.8 . This was
the motivation behind the new section on the download page.

On 25.01.2019 06:42, Jark Wu wrote:

> +1 for the leaner distribution and improve the "Download" page.
>
> On Fri, 25 Jan 2019 at 01:54, Bowen Li <[hidden email]> wrote:
>
>> +1 for leaner distribution and a better 'download' webpage.
>>
>> +1 for a full distribution if we can automate it besides supporting the
>> leaner one. If we support both, I'd image release managers should be able
>> to package two distributions with a single change of parameter instead of
>> manually package the full distribution. How to achieve that needs to be
>> evaluated and discussed, probably can be something like 'mvn clean install
>> -Dfull/-Dlean', I'm not sure yet.
>>
>>
>> On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise <[hidden email]> wrote:
>>
>>> +1 for trimming the size by default and offering the fat distribution as
>>> alternative download
>>>
>>>
>>> On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann <[hidden email]>
>>> wrote:
>>>
>>>> Ufuk's proposal (having a lean default release and a user convenience
>>>> tarball) sounds good to me. That way advanced users won't be bothered by
>>>> an
>>>> unnecessarily large release and new users can benefit from having many
>>>> useful extensions bundled in one tarball.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi <[hidden email]> wrote:
>>>>
>>>>> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther <[hidden email]>
>>>> wrote:
>>>>>> I think what is more important than a big dist bundle is a helpful
>>>>>> "Downloads" page where users can easily find available filesystems,
>>>>>> connectors, metric repoters. Not everyone checks Maven central for
>>>>>> available JAR files. I just saw that we added a "Optional components"
>>>>>> section recently [1], we just need to make it more prominent. This is
>>>>>> also done for the SQL connectors and formats [2].
>>>>> +1 I fully agree with the importance of the Downloads page. We
>>>>> definitely need to make any optional dependencies that users need to
>>>>> download easy to find.
>>>>>