Re: [DISCUSS] Towards a leaner flink-dist

Posted by Timo Walther on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615p25688.html

+1 for Stephan's suggestion. For example, SQL connectors have never been
part of the main distribution and nobody complained about this so far. I
think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].

[1] https://flink.apache.org/downloads.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies

Regards,
Timo


Am 23.01.19 um 10:07 schrieb Ufuk Celebi:

> I like the idea of a leaner binary distribution. At the same time I
> agree with Jamie that the current binary is quite convenient and
> connection speeds should not be that big of a deal. Since the binary
> distribution is one of the first entry points for users, I'd like to
> keep it as user-friendly as possible.
>
> What do you think about building a lean distribution by default and a
> "full" distribution that still bundles all the optional dependencies
> for releases? (If you don't think that's feasible I'm still +1 to only
> go with the "lean dist" approach.)
>
> – Ufuk
>
> On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen <[hidden email]> wrote:
>> There are some points where a leaner approach could help.
>> There are many libraries and connectors that are currently being adding to
>> Flink, which makes the "include all" approach not completely feasible in
>> long run:
>>
>>    - Connectors: For a proper experience with the Shell/CLI (for example for
>> SQL) we need a lot of fat connector jars.
>>      These come often for multiple versions, which alone accounts for 100s
>> of MBs of connector jars.
>>    - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
>> themselves.
>>    - The metric reporters are bit by bit growing as well.
>>
>> The following could be a compromise:
>>
>> The flink-dist would include
>>    - the core flink libraries (core, apis, runtime, etc.)
>>    - yarn / mesos  etc. adapters
>>    - examples (the examples should be a small set of self-contained programs
>> without additional dependencies)
>>    - default logging
>>    - default metric reporter (jmx)
>>    - shells (scala, sql)
>>
>> The flink-dist would NOT include the following libs (and these would be
>> offered for individual download)
>>    - Hadoop libs
>>    - the pre-shaded file systems
>>    - the pre-packaged SQL connectors
>>    - additional metric reporters
>>
>>
>> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang <[hidden email]> wrote:
>>
>>> Thanks Chesnay for raising this discussion thread.  I think there are 3
>>> major use scenarios for flink binary distribution.
>>>
>>> 1. Use it to set up standalone cluster
>>> 2. Use it to experience features of flink, such as via scala-shell,
>>> sql-client
>>> 3. Downstream project use it to integrate with their system
>>>
>>> I did a size estimation of flink dist folder, lib folder take around 100M
>>> and opt folder take around 200M. Overall I agree to make a thin flink dist.
>>> So the next problem is which components to drop. I check the opt folder,
>>> and I think the filesystem components and metrics components could be moved
>>> out. Because they are pluggable components and is only used in scenario 1 I
>>> think (setting up standalone cluster). Other components like flink-table,
>>> flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
>>> still use it to try the features of flink. For me, scala-shell is the first
>>> option to try new features of flink.
>>>
>>>
>>>
>>> Fabian Hueske <[hidden email]> 于2019年1月18日周五 下午7:34写道:
>>>
>>>> Hi Chesnay,
>>>>
>>>> Thank you for the proposal.
>>>> I think this is a good idea.
>>>> We follow a similar approach already for Hadoop dependencies and
>>>> connectors (although in application space).
>>>>
>>>> +1
>>>>
>>>> Fabian
>>>>
>>>> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
>>>> [hidden email]>:
>>>>
>>>>> Hello,
>>>>>
>>>>> the binary distribution that we release by now contains quite a lot of
>>>>> optional components, including various filesystems, metric reporters and
>>>>> libraries. Most users will only use a fraction of these, and as such
>>>>> pretty much only increase the size of flink-dist.
>>>>>
>>>>> With Flink growing more and more in scope I don't believe it to be
>>>>> feasible to ship everything we have with every distribution, and instead
>>>>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>>>>> lean and additional components are downloaded separately and added by
>>>>> the user.
>>>>>
>>>>> This would primarily affect the /opt directory, but could also be
>>>>> extended to cover flink-dist. For example, the yarn and mesos code could
>>>>> be spliced out into separate jars that could be added to lib manually.
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Chesnay
>>>>>
>>>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>