Flink packaging makes life hard for SBT fat jar's

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink packaging makes life hard for SBT fat jar's

shikhar
Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt assembly`

A fat jar seems like the best way to provide jobs for Flink to execute.

I am declaring deps like:
{noformat}
"org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "provided"
"org.apache.flink" %% "flink-streaming-scala" % "1.0-SNAPSHOT" % "provided"
"org.apache.flink" %% "flink-connector-kafka-0.8" % "1.0-SNAPSHOT"
{noformat}

Connectors aren't included in the distribution so can't mark the Kafka connector as 'provided'.

Using sbt-assembly plugin and running the 'assembly' task, I get lots of failures because:

```
[error] deduplicate: different file contents found in the following:
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-0.8_2.11/jars/flink-connector-kafka-0.8_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-base_2.11/jars/flink-connector-kafka-base_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-streaming-java_2.11/jars/flink-streaming-java_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-core/jars/flink-core-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-shaded-hadoop2/jars/flink-shaded-hadoop2-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-runtime_2.11/jars/flink-runtime_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-java/jars/flink-java-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-clients_2.11/jars/flink-clients_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error] /Users/shikhar/.ivy2/cache/org.apache.flink/flink-optimizer_2.11/jars/flink-optimizer_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
```

I tried declaring a MergeStrategy as per https://github.com/shikhar/flink-sbt-fatjar-troubles/blob/master/build.sbt#L13-L18, which helps with the shading conflicts, but then I get lots of errors from conflicts in `commons-collections` vs `commons-beanutils` vs `commons-beanutils-core`, which are deps pulled in via Flink:

```
[error] deduplicate: different file contents found in the following:
[error] /Users/shikhar/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.2.jar:org/apache/commons/collections/FastHashMap.class
[error] /Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap.class
[error] /Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils-core/jars/commons-beanutils-core-1.8.0.jar:org/apache/commons/collections/FastHashMap.class
```

The best way I have found to work around this for now is also mark the flink-kafka connector as a 'provided' dependency and customize flink-dist to include it :( I'd really rather not create a custom distribution.
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Stephan Ewen
Hi!

Looks like that experience should be improved.

Do you know why you are getting conflicts on the FashHashMap class, even though the core Flink dependencies are "provided"? Does adding the Kafka connector pull in all the core Flink dependencies?

Concerning the Kafka connector: We did not include the connectors in the distribution, because that would overload the distribution with a huge number of dependencies from the connectors

Greetings,
Stephan


On Fri, Feb 12, 2016 at 10:44 PM, shikhar <[hidden email]> wrote:
Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt
assembly`

A fat jar seems like the best way to provide jobs for Flink to execute.

I am declaring deps like:
{noformat}
"org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "provided"
"org.apache.flink" %% "flink-streaming-scala" % "1.0-SNAPSHOT" % "provided"
"org.apache.flink" %% "flink-connector-kafka-0.8" % "1.0-SNAPSHOT"
{noformat}

Connectors aren't included in the distribution so can't mark the Kafka
connector as 'provided'.

Using sbt-assembly plugin and running the 'assembly' task, I get lots of
failures because:

```
[error] deduplicate: different file contents found in the following:
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-0.8_2.11/jars/flink-connector-kafka-0.8_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-connector-kafka-base_2.11/jars/flink-connector-kafka-base_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-streaming-java_2.11/jars/flink-streaming-java_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-core/jars/flink-core-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-shaded-hadoop2/jars/flink-shaded-hadoop2-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-runtime_2.11/jars/flink-runtime_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-java/jars/flink-java-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-clients_2.11/jars/flink-clients_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
[error]
/Users/shikhar/.ivy2/cache/org.apache.flink/flink-optimizer_2.11/jars/flink-optimizer_2.11-1.0-SNAPSHOT.jar:org/apache/flink/shaded/com/google/common/util/concurrent/package-info.class
```

I tried declaring a MergeStrategy as per
https://github.com/shikhar/flink-sbt-fatjar-troubles/blob/master/build.sbt#L13-L18,
which helps with the shading conflicts, but then I get lots of errors from
conflicts in `commons-collections` vs `commons-beanutils` vs
`commons-beanutils-core`, which are deps pulled in via Flink:

```
[error] deduplicate: different file contents found in the following:
[error]
/Users/shikhar/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.2.jar:org/apache/commons/collections/FastHashMap.class
[error]
/Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap.class
[error]
/Users/shikhar/.ivy2/cache/commons-beanutils/commons-beanutils-core/jars/commons-beanutils-core-1.8.0.jar:org/apache/commons/collections/FastHashMap.class
```

The best way I have found to work around this for now is also mark the
flink-kafka connector as a 'provided' dependency and customize flink-dist to
include it :( I'd really rather not create a custom distribution.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

shikhar
Stephan Ewen wrote
Do you know why you are getting conflicts on the FashHashMap class, even
though the core Flink dependencies are "provided"? Does adding the Kafka
connector pull in all the core Flink dependencies?
Yes, the core Flink dependencies are being pulled in transitively from the Kafka connector.
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Stephan Ewen
Hi!

I know that Till is currently looking into making the SBT experience better. He should have an update in a bit.

We need to check a few corner cases about how SBT and Maven dependencies and types (provided, etc) interact and come up with a plan.

We'll also add an SBT quickstart to the homepage as a result of this, to help making this easier.

Greetings,
Stephan


On Mon, Feb 15, 2016 at 3:41 PM, shikhar <[hidden email]> wrote:
Stephan Ewen wrote
> Do you know why you are getting conflicts on the FashHashMap class, even
> though the core Flink dependencies are "provided"? Does adding the Kafka
> connector pull in all the core Flink dependencies?

Yes, the core Flink dependencies are being pulled in transitively from the
Kafka connector.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p4924.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

shikhar
In reply to this post by shikhar
This seems to work to generate the assembly, hopefully not missing any required transitive deps:

```
  "org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.kafka" %% "kafka" % "0.8.2.2",
  ("org.apache.flink" %% "flink-connector-kafka-base" % flinkVersion).intransitive(),
  ("org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion).intransitive(),
```
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Till Rohrmann
In reply to this post by Stephan Ewen
Hi Shikhar,

I just wanted to let you know that we've found the problem with the failing assembly plugin. It was caused by incompatible classes [1, 2]. Once these PRs are merged, the merge problems should be resolved.

By the way, we've also added now a SBT template for Flink projects using giter8 which you might wanna check out [3, 4]. This will set up a proper sbt project structure.


Cheers,
Till

On Wed, Feb 17, 2016 at 4:14 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

I know that Till is currently looking into making the SBT experience better. He should have an update in a bit.

We need to check a few corner cases about how SBT and Maven dependencies and types (provided, etc) interact and come up with a plan.

We'll also add an SBT quickstart to the homepage as a result of this, to help making this easier.

Greetings,
Stephan


On Mon, Feb 15, 2016 at 3:41 PM, shikhar <[hidden email]> wrote:
Stephan Ewen wrote
> Do you know why you are getting conflicts on the FashHashMap class, even
> though the core Flink dependencies are "provided"? Does adding the Kafka
> connector pull in all the core Flink dependencies?

Yes, the core Flink dependencies are being pulled in transitively from the
Kafka connector.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p4924.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

shikhar
Hi Till,

Thanks so much for sorting this out!

One suggestion, can the Flink template depend on a connector (Kafka?) -- this would verify that assembly works smoothly for a very common use-case when you need to include connector JAR's.

Cheers,

Shikhar
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Till Rohrmann
Hi Shikhar,

you're right that including a connector dependency would have let us spot the problem earlier. In fact, any project building a fat jar with SBT would have failed without setting the flink dependencies to provided.

The problem is that the template is a general purpose template. Thus, it is also used for batch jobs. I fear that by including a connector in the default `build.sbt` file, many users will forget about it and simply include it in their job jars. I admit that I'm not totally consistent with my argumentation here because we're also including the `flink-scala` and `flink-streaming-scala` dependency per default. But I would like to keep the initial list of dependencies as lean as possible.

What I've done, however, is to add testing SBT builds with a connector to our release testing tasks. This should catch similar problems which you've came across.

Cheers,
Till

On Mon, Feb 22, 2016 at 6:20 PM, shikhar <[hidden email]> wrote:
Hi Till,

Thanks so much for sorting this out!

One suggestion, can the Flink template depend on a connector (Kafka?) --
this would verify that assembly works smoothly for a very common use-case
when you need to include connector JAR's.

Cheers,

Shikhar



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p5078.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

shikhar
Hi Till,

I just tried creating an assembly with RC4:

```
  "org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion,
```

It actually succeeds in creating the assembly now, which is great. However, I see that it is pulling in the Scala 2.10 version of the Kafka JAR's. Perhaps the correct Scala version is not specified in the published POM's for transitive dependencies?

https://repository.apache.org/content/repositories/orgapacheflink-1066/org/apache/flink/flink-connector-kafka-0.8_2.11/1.0.0/flink-connector-kafka-0.8_2.11-1.0.0.pom refers to ${scala.binary.version} -- not sure how that is resolved
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Till Rohrmann-2

Hi Shikhar,

that is a problem we just found out today. The problem is that the scala.binary.version was not properly replaced in the parent pom so that it resolves to 2.10 [1]. Max already opened a PR to fix this problem. With the next release candidate, this should be fixed.

[1] https://issues.apache.org/jira/browse/FLINK-3565

Cheers,
Till


On Wed, Mar 2, 2016 at 6:24 PM, shikhar <[hidden email]> wrote:
Hi Till,

I just tried creating an assembly with RC4:

```
  "org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion,
```

It actually succeeds in creating the assembly now, which is great. However,
I see that it is pulling in the Scala 2.10 version of the Kafka JAR's.
Perhaps the correct Scala version is not specified in the published POM's
for transitive dependencies?

https://repository.apache.org/content/repositories/orgapacheflink-1066/org/apache/flink/flink-connector-kafka-0.8_2.11/1.0.0/flink-connector-kafka-0.8_2.11-1.0.0.pom
refers to ${scala.binary.version} -- not sure how that is resolved



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p5249.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

shikhar
Thanks Till. I can confirm that things are looking good with RC5. sbt-assembly works well with the flink-kafka connector dependency not marked as "provided".
Reply | Threaded
Open this post in threaded view
|

Re: Flink packaging makes life hard for SBT fat jar's

Till Rohrmann-2

Great to hear Shikhar :-)

Cheers, Till

On Mar 4, 2016 3:51 AM, "shikhar" <[hidden email]> wrote:
Thanks Till. I can confirm that things are looking good with RC5.
sbt-assembly works well with the flink-kafka connector dependency not marked
as "provided".



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p5292.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.