(DEPRECATED) Apache Flink User Mailing List archive.

Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Classic

List

Threaded

5 messages Options

Yik San Chan

Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.

## Connector dependencies should be in default scope

This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests:

```



```

It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies):

> We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image.
>
> Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided).

## Hive connector dependencies should be in provided scope

However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite:

> If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime.

## Why?

Thanks!

Best,

Yik San

Till Rohrmann

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Hi Yik San,

for future reference, I copy my answer from the SO here:

The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you start the cluster. That way the cluster is enabled to run jobs which use Hive. At the same time, you don't have to bundle this dependency in the user jar which reduces its size. However, there shouldn't be anything preventing you from bundling the Hive dependency with your user code if you want to.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies

Cheers,

Till

On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <[hidden email]> wrote:

The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.

## Connector dependencies should be in default scope

This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests:

```



```

It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies):

> We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image.
>
> Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided).

## Hive connector dependencies should be in provided scope

However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite:

> If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime.

## Why?

Thanks!

Best,
Yik San

Yik San Chan

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Hi Till, I have 2 follow-ups.

(1) Why is Hive special, while for connectors such as kafka, the docs suggest simply bundling the kafka connector dependency with my user code?

(2) it seems the document misses the "before you start the cluster" part - does it always require a cluster restart whenever the /lib directory changes?

Thanks.

Best,

Yik San

On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <[hidden email]> wrote:

Hi Yik San,

for future reference, I copy my answer from the SO here:

The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you start the cluster. That way the cluster is enabled to run jobs which use Hive. At the same time, you don't have to bundle this dependency in the user jar which reduces its size. However, there shouldn't be anything preventing you from bundling the Hive dependency with your user code if you want to.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies

Cheers,
Till

On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <[hidden email]> wrote:
The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.

## Connector dependencies should be in default scope

This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests:

```



```

It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies):

> We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image.
>
> Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided).

## Hive connector dependencies should be in provided scope

However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite:

> If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime.

## Why?

Thanks!

Best,
Yik San

Till Rohrmann

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Hi Yik San,

(1) You could do the same with Kafka. For Hive I believe that the dependency is simply quite large so that it hurts more if you bundle it with your user code.

(2) If you change the content in the lib directory, then you have to restart the cluster.

Cheers,

Till

On Fri, Apr 9, 2021 at 4:02 AM Yik San Chan <[hidden email]> wrote:

Hi Till, I have 2 follow-ups.

(1) Why is Hive special, while for connectors such as kafka, the docs suggest simply bundling the kafka connector dependency with my user code?

(2) it seems the document misses the "before you start the cluster" part - does it always require a cluster restart whenever the /lib directory changes?

Thanks.

Best,
Yik San

On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <[hidden email]> wrote:
Hi Yik San,

for future reference, I copy my answer from the SO here:

The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you start the cluster. That way the cluster is enabled to run jobs which use Hive. At the same time, you don't have to bundle this dependency in the user jar which reduces its size. However, there shouldn't be anything preventing you from bundling the Hive dependency with your user code if you want to.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies

Cheers,
Till

On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <[hidden email]> wrote:
The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.

## Connector dependencies should be in default scope

This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests:

```



```

It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies):

> We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image.
>
> Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided).

## Hive connector dependencies should be in provided scope

However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite:

> If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime.

## Why?

Thanks!

Best,
Yik San

Yik San Chan

Re: Why does flink-quickstart-scala suggests adding connector dependencies in the default scope, while Flink Hive integration docs suggest the opposite

Thank you Till!

On Fri, Apr 9, 2021 at 4:25 PM Till Rohrmann <[hidden email]> wrote:

Hi Yik San,

(1) You could do the same with Kafka. For Hive I believe that the dependency is simply quite large so that it hurts more if you bundle it with your user code.

(2) If you change the content in the lib directory, then you have to restart the cluster.

Cheers,
Till

On Fri, Apr 9, 2021 at 4:02 AM Yik San Chan <[hidden email]> wrote:
Hi Till, I have 2 follow-ups.

(1) Why is Hive special, while for connectors such as kafka, the docs suggest simply bundling the kafka connector dependency with my user code?

(2) it seems the document misses the "before you start the cluster" part - does it always require a cluster restart whenever the /lib directory changes?

Thanks.

Best,
Yik San

On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <[hidden email]> wrote:
Hi Yik San,

for future reference, I copy my answer from the SO here:

The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you start the cluster. That way the cluster is enabled to run jobs which use Hive. At the same time, you don't have to bundle this dependency in the user jar which reduces its size. However, there shouldn't be anything preventing you from bundling the Hive dependency with your user code if you want to.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#dependencies

Cheers,
Till

On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <[hidden email]> wrote:
The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.

## Connector dependencies should be in default scope

This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests:

```



```

It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies):

> We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image.
>
> Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided).

## Hive connector dependencies should be in provided scope

However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite:

> If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime.

## Why?

Thanks!

Best,
Yik San