The question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67001326/why-does-flink-quickstart-scala-suggests-adding-connector-dependencies-in-the-de.
## Connector dependencies should be in default scope This is what [flink-quickstart-scala](https://github.com/apache/flink/blob/d12eeedfac6541c3a0711d1580ce3bd68120ca90/flink-quickstart/flink-quickstart-scala/src/main/resources/archetype-resources/pom.xml#L84) suggests: ``` <!-- Add connector dependencies here. They must be in the default scope (compile). --> <!-- Example: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> --> ``` It also aligns with [Flink project configuration](https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/project-configuration.html#adding-connector-and-library-dependencies): > We recommend packaging the application code and all its required dependencies into one jar-with-dependencies which we refer to as the application jar. The application jar can be submitted to an already running Flink cluster, or added to a Flink application container image. > > Important: For Maven (and other build tools) to correctly package the dependencies into the application jar, these application dependencies must be specified in scope compile (unlike the core dependencies, which must be specified in scope provided). ## Hive connector dependencies should be in provided scope However, [Flink Hive Integration docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hive/#program-maven) suggests the opposite: > If you are building your own program, you need the following dependencies in your mvn file. It’s recommended not to include these dependencies in the resulting jar file. You’re supposed to add dependencies as stated above at runtime. ## Why? Thanks! Best, Yik San |
Hi Yik San, for future reference, I copy my answer from the SO here: The reason for this difference is that for Hive it is recommended to start the cluster with the respective Hive dependencies. The documentation [1] states that it's best to put the dependencies into the lib directory before you start the cluster. That way the cluster is enabled to run jobs which use Hive. At the same time, you don't have to bundle this dependency in the user jar which reduces its size. However, there shouldn't be anything preventing you from bundling the Hive dependency with your user code if you want to. Cheers, Till On Thu, Apr 8, 2021 at 11:41 AM Yik San Chan <[hidden email]> wrote:
|
Hi Till, I have 2 follow-ups. (1) Why is Hive special, while for connectors such as kafka, the docs suggest simply bundling the kafka connector dependency with my user code? (2) it seems the document misses the "before you start the cluster" part - does it always require a cluster restart whenever the /lib directory changes? Thanks. Best, Yik San On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann <[hidden email]> wrote:
|
Hi Yik San, (1) You could do the same with Kafka. For Hive I believe that the dependency is simply quite large so that it hurts more if you bundle it with your user code. (2) If you change the content in the lib directory, then you have to restart the cluster. Cheers, Till On Fri, Apr 9, 2021 at 4:02 AM Yik San Chan <[hidden email]> wrote:
|
Thank you Till! On Fri, Apr 9, 2021 at 4:25 PM Till Rohrmann <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |