Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt assembly`
A fat jar seems like the best way to provide jobs for Flink to execute. I am declaring deps like: {noformat} "org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "provided" "org.apache.flink" %% "flink-streaming-scala" % "1.0-SNAPSHOT" % "provided" "org.apache.flink" %% "flink-connector-kafka-0.8" % "1.0-SNAPSHOT" {noformat} Connectors aren't included in the distribution so can't mark the Kafka connector as 'provided'. Using sbt-assembly plugin and running the 'assembly' task, I get lots of failures because: ``` [error] deduplicate: different file contents found in the following: [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-0.8_2.11/jars/flink-connector-kafka-0.8_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-base_2.11/jars/flink-connector-kafka-base_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-streaming-java_2.11/jars/flink-streaming-java_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-core/jars/flink-core-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-shaded-hadoop2/jars/flink-shaded-hadoop2-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-runtime_2.11/jars/flink-runtime_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-java/jars/flink-java-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-clients_2.11/jars/flink-clients_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class [error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-optimizer_2.11/jars/flink-optimizer_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class ``` I tried declaring a MergeStrategy as per https://github.com/shikhar/flink-sbt-fatjar-troubles/blob/master/build.sbt#L13-L18, which helps with the shading conflicts, but then I get lots of errors from conflicts in `commons-collections` vs `commons-beanutils` vs `commons-beanutils-core`, which are deps pulled in via Flink: ``` [error] deduplicate: different file contents found in the following: [error] /Users/shikhar/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.2.jar:org/apache/commons/collections/FastHashMap.class [error] /Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap.class [error] /Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils-core/jars/commons-beanutils-core-1.8.0.jar:org/apache/commons/collections/FastHashMap.class ``` The best way I have found to work around this for now is also mark the flink-kafka connector as a 'provided' dependency and customize flink-dist to include it :( I'd really rather not create a custom distribution. |
Hi! Looks like that experience should be improved. Do you know why you are getting conflicts on the FashHashMap class, even though the core Flink dependencies are "provided"? Does adding the Kafka connector pull in all the core Flink dependencies? Concerning the Kafka connector: We did not include the connectors in the distribution, because that would overload the distribution with a huge number of dependencies from the connectors Greetings, Stephan On Fri, Feb 12, 2016 at 10:44 PM, shikhar <[hidden email]> wrote: Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt |
Yes, the core Flink dependencies are being pulled in transitively from the Kafka connector. |
Hi! I know that Till is currently looking into making the SBT experience better. He should have an update in a bit. We need to check a few corner cases about how SBT and Maven dependencies and types (provided, etc) interact and come up with a plan. We'll also add an SBT quickstart to the homepage as a result of this, to help making this easier. Greetings, Stephan On Mon, Feb 15, 2016 at 3:41 PM, shikhar <[hidden email]> wrote: Stephan Ewen wrote |
In reply to this post by shikhar
This seems to work to generate the assembly, hopefully not missing any required transitive deps:
``` "org.apache.flink" %% "flink-clients" % flinkVersion % "provided", "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided", "org.apache.kafka" %% "kafka" % "0.8.2.2", ("org.apache.flink" %% "flink-connector-kafka-base" % flinkVersion).intransitive(), ("org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion).intransitive(), ``` |
In reply to this post by Stephan Ewen
Hi Shikhar, I just wanted to let you know that we've found the problem with the failing assembly plugin. It was caused by incompatible classes [1, 2]. Once these PRs are merged, the merge problems should be resolved. By the way, we've also added now a SBT template for Flink projects using giter8 which you might wanna check out [3, 4]. This will set up a proper sbt project structure. Cheers, Till On Wed, Feb 17, 2016 at 4:14 PM, Stephan Ewen <[hidden email]> wrote:
|
Hi Till,
Thanks so much for sorting this out! One suggestion, can the Flink template depend on a connector (Kafka?) -- this would verify that assembly works smoothly for a very common use-case when you need to include connector JAR's. Cheers, Shikhar |
Hi Shikhar, you're right that including a connector dependency would have let us spot the problem earlier. In fact, any project building a fat jar with SBT would have failed without setting the flink dependencies to provided. The problem is that the template is a general purpose template. Thus, it is also used for batch jobs. I fear that by including a connector in the default `build.sbt` file, many users will forget about it and simply include it in their job jars. I admit that I'm not totally consistent with my argumentation here because we're also including the `flink-scala` and `flink-streaming-scala` dependency per default. But I would like to keep the initial list of dependencies as lean as possible. What I've done, however, is to add testing SBT builds with a connector to our release testing tasks. This should catch similar problems which you've came across. Cheers, Till On Mon, Feb 22, 2016 at 6:20 PM, shikhar <[hidden email]> wrote: Hi Till, |
Hi Till,
I just tried creating an assembly with RC4: ``` "org.apache.flink" %% "flink-clients" % flinkVersion % "provided", "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided", "org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion, ``` It actually succeeds in creating the assembly now, which is great. However, I see that it is pulling in the Scala 2.10 version of the Kafka JAR's. Perhaps the correct Scala version is not specified in the published POM's for transitive dependencies? https://repository.apache.org/content/repositories/orgapacheflink-1066/org/apache/flink/flink-connector-kafka-0.8_2.11/1.0.0/flink-connector-kafka-0.8_2.11-1.0.0.pom refers to ${scala.binary.version} -- not sure how that is resolved |
Hi Shikhar, that is a problem we just found out today. The problem is that the [1] https://issues.apache.org/jira/browse/FLINK-3565 Cheers, On Wed, Mar 2, 2016 at 6:24 PM, shikhar <[hidden email]> wrote: Hi Till, |
Thanks Till. I can confirm that things are looking good with RC5. sbt-assembly works well with the flink-kafka connector dependency not marked as "provided".
|
Great to hear Shikhar :-) Cheers, Till On Mar 4, 2016 3:51 AM, "shikhar" <[hidden email]> wrote:
Thanks Till. I can confirm that things are looking good with RC5. |
Free forum by Nabble | Edit this page |