(DEPRECATED) Apache Flink User Mailing List archive.

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Classic

List

Threaded

2 messages Options

phoenixjiangnan

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

cc user ML in case anyone want to chime in

On Fri, Dec 13, 2019 at 00:44 Bowen Li <[hidden email]> wrote:

Hi all,

I want to propose to have a couple separate Flink distributions with Hive dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions will be provided to users on Flink download page [1].

A few reasons to do this:

1) Flink-Hive integration is important to many many Flink and Hive users in two dimensions:
a) for Flink metadata: HiveCatalog is the only persistent catalog to manage Flink tables. With Flink 1.10 supporting more DDL, the persistent catalog would be playing even more critical role in users' workflow
b) for Flink data: Hive data connector (source/sink) helps both Flink and Hive users to unlock new use cases in streaming, near-realtime/realtime data warehouse, backfill, etc.

2) currently users have to go thru a *really* tedious process to get started, because it requires lots of extra jars (see [2]) that are absent in Flink's lean distribution. We've had so many users from public mailing list, private email, DingTalk groups who got frustrated on spending lots of time figuring out the jars themselves. They would rather have a more "right out of box" quickstart experience, and play with the catalog and source/sink without hassle.

3) it's easier for users to replace those Hive dependencies for their own Hive versions - just replace those jars with the right versions and no need to find the doc.

* Hive 2.3.4 and 1.2.1 are two versions that represent lots of user base out there, and that's why we are using them as examples for dependencies in [1] even though we've supported almost all Hive versions [3] now.

I want to hear what the community think about this, and how to achieve it if we believe that's the way to go.

Cheers,
Bowen

[1] https://flink.apache.org/downloads.html
[2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
[3] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions

Jeff Zhang

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

+1, this is definitely necessary for better user experience. Setting up environment is always painful for many big data tools.

Bowen Li <[hidden email]> 于2019年12月13日周五下午5:02写道：

cc user ML in case anyone want to chime in

On Fri, Dec 13, 2019 at 00:44 Bowen Li <[hidden email]> wrote:
Hi all,

I want to propose to have a couple separate Flink distributions with Hive dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions will be provided to users on Flink download page [1].

A few reasons to do this:

1) Flink-Hive integration is important to many many Flink and Hive users in two dimensions:
a) for Flink metadata: HiveCatalog is the only persistent catalog to manage Flink tables. With Flink 1.10 supporting more DDL, the persistent catalog would be playing even more critical role in users' workflow
b) for Flink data: Hive data connector (source/sink) helps both Flink and Hive users to unlock new use cases in streaming, near-realtime/realtime data warehouse, backfill, etc.

2) currently users have to go thru a *really* tedious process to get started, because it requires lots of extra jars (see [2]) that are absent in Flink's lean distribution. We've had so many users from public mailing list, private email, DingTalk groups who got frustrated on spending lots of time figuring out the jars themselves. They would rather have a more "right out of box" quickstart experience, and play with the catalog and source/sink without hassle.

3) it's easier for users to replace those Hive dependencies for their own Hive versions - just replace those jars with the right versions and no need to find the doc.

* Hive 2.3.4 and 1.2.1 are two versions that represent lots of user base out there, and that's why we are using them as examples for dependencies in [1] even though we've supported almost all Hive versions [3] now.

I want to hear what the community think about this, and how to achieve it if we believe that's the way to go.

Cheers,
Bowen

[1] https://flink.apache.org/downloads.html
[2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
[3] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions

Best Regards

Jeff Zhang