Setup of Scala/Flink project using Bazel

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Setup of Scala/Flink project using Bazel

Salva Alcántara
I am trying to setup a simple flink application from scratch using Bazel.
I've bootstrapped the project by running

```
sbt new tillrohrmann/flink-project.g8
```

and after that I have added some files in order for Bazel to take control of
the building (i.e., migrate from sbt). This is how the `WORKSPACE` looks
like

```
# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

skylib_version = "1.0.3"
http_archive(
    name = "bazel_skylib",
    sha256 =
"1c531376ac7e5a180e0237938a2536de0c54d93f5c278634818e0efc952dd56c",
    type = "tar.gz",
    url =
"https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/{}/bazel-skylib-{}.tar.gz".format(skylib_version,
skylib_version),
)

rules_scala_version = "5df8033f752be64fbe2cedfd1bdbad56e2033b15"

http_archive(
    name = "io_bazel_rules_scala",
    sha256 =
"b7fa29db72408a972e6b6685d1bc17465b3108b620cb56d9b1700cf6f70f624a",
    strip_prefix = "rules_scala-%s" % rules_scala_version,
    type = "zip",
    url = "<a href="https://github.com/bazelbuild/rules_scala/archive/%s.zip">https://github.com/bazelbuild/rules_scala/archive/%s.zip" %
rules_scala_version,
)

# Stores Scala version and other configuration
# 2.12 is a default version, other versions can be use by passing them
explicitly:
load("@io_bazel_rules_scala//:scala_config.bzl", "scala_config")
scala_config(scala_version = "2.12.11")

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories")
scala_repositories()

load("@io_bazel_rules_scala//scala:toolchains.bzl",
"scala_register_toolchains")
scala_register_toolchains()

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_binary", "scala_test")

# optional: setup ScalaTest toolchain and dependencies
load("@io_bazel_rules_scala//testing:scalatest.bzl",
"scalatest_repositories", "scalatest_toolchain")
scalatest_repositories()
scalatest_toolchain()

load("//vendor:workspace.bzl", "maven_dependencies")
maven_dependencies()

load("//vendor:target_file.bzl", "build_external_workspace")
build_external_workspace(name = "vendor")
```

and this is the `BUILD` file

```bazel
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_test")

scala_library(
    name = "job",
    srcs = glob(["src/main/scala/**/*.scala"]),
    deps = [
        "@vendor//vendor/org/apache/flink:flink_clients",
        "@vendor//vendor/org/apache/flink:flink_scala",
        "@vendor//vendor/org/apache/flink:flink_streaming_scala",
    ]
)
```

I'm using [bazel-deps](https://github.com/johnynek/bazel-deps) for vendoring
the dependencies (put in the `vendor` folder). I have this on my
`dependencies.yaml` file:

```yaml
options:
  buildHeader: [
      "load(\"@io_bazel_rules_scala//scala:scala_import.bzl\",
\"scala_import\")",
      "load(\"@io_bazel_rules_scala//scala:scala.bzl\", \"scala_library\",
\"scala_binary\", \"scala_test\")",
  ]
  languages: [ "java", "scala:2.12.11" ]
  resolverType: "coursier"
  thirdPartyDirectory: "vendor"
  resolvers:
    - id: "mavencentral"
      type: "default"
      url: https://repo.maven.apache.org/maven2/
  strictVisibility: true
  transitivity: runtime_deps
  versionConflictPolicy: highest

dependencies:
  org.apache.flink:
    flink:
      lang: scala
      version: "1.11.2"
      modules: [clients, scala, streaming-scala] # provided
    flink-connector-kafka:
      lang: java
      version: "0.10.2"
    flink-test-utils:
      lang: java
      version: "0.10.2"
```

For downloading the dependencies, I'm running

```
bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala
--sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps
dependencies.yaml
```

Which runs just fine, but then when I try to build the project

```
bazel build //:job
```

I'm getting this error

```
Starting local Bazel server and connecting to it...
ERROR: Traceback (most recent call last):
        File "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line
44, column 25, in <toplevel>
                build_external_workspace(name = "vendor")
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 258, column 91, in build_external_workspace
                return build_external_workspace_from_opts(name = name, target_configs =
list_target_data(), separator = list_target_data_separator(), build_header =
build_header())
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 251, column 40, in list_target_data
                "vendor/org/apache/flink:flink_clients":
["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients","visibility||||||//visibility:public","kind||||||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],
Error: dictionary expression has duplicate key:
"vendor/org/apache/flink:flink_clients"
ERROR: error loading package 'external': Package 'external' contains errors
INFO: Elapsed time: 3.644s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
```

Why is that? Anyone can help? It would be great having detailed instructions
and project templates for Flink/Scala applications using Bazel. I've put
everything together in the following repo:
https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR
or whatever.

PS: Also posted in SO:
https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Matthias
Hi Salva,
unfortunately, I have no experience with Bazel. Just by looking at the code you shared I cannot come up with an answer either. Have you checked out the ML thread in [1]? It provides two other examples where users used Bazel in the context of Flink. This might give you some hints on where to look.

Sorry for not being more helpful.

Best,
Matthias


On Fri, Apr 30, 2021 at 11:57 AM Salva Alcántara <[hidden email]> wrote:
I am trying to setup a simple flink application from scratch using Bazel.
I've bootstrapped the project by running

```
sbt new tillrohrmann/flink-project.g8
```

and after that I have added some files in order for Bazel to take control of
the building (i.e., migrate from sbt). This is how the `WORKSPACE` looks
like

```
# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

skylib_version = "1.0.3"
http_archive(
    name = "bazel_skylib",
    sha256 =
"1c531376ac7e5a180e0237938a2536de0c54d93f5c278634818e0efc952dd56c",
    type = "tar.gz",
    url =
"https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/{}/bazel-skylib-{}.tar.gz".format(skylib_version,
skylib_version),
)

rules_scala_version = "5df8033f752be64fbe2cedfd1bdbad56e2033b15"

http_archive(
    name = "io_bazel_rules_scala",
    sha256 =
"b7fa29db72408a972e6b6685d1bc17465b3108b620cb56d9b1700cf6f70f624a",
    strip_prefix = "rules_scala-%s" % rules_scala_version,
    type = "zip",
    url = "<a href="https://github.com/bazelbuild/rules_scala/archive/%s.zip" rel="noreferrer" target="_blank">https://github.com/bazelbuild/rules_scala/archive/%s.zip" %
rules_scala_version,
)

# Stores Scala version and other configuration
# 2.12 is a default version, other versions can be use by passing them
explicitly:
load("@io_bazel_rules_scala//:scala_config.bzl", "scala_config")
scala_config(scala_version = "2.12.11")

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories")
scala_repositories()

load("@io_bazel_rules_scala//scala:toolchains.bzl",
"scala_register_toolchains")
scala_register_toolchains()

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_binary", "scala_test")

# optional: setup ScalaTest toolchain and dependencies
load("@io_bazel_rules_scala//testing:scalatest.bzl",
"scalatest_repositories", "scalatest_toolchain")
scalatest_repositories()
scalatest_toolchain()

load("//vendor:workspace.bzl", "maven_dependencies")
maven_dependencies()

load("//vendor:target_file.bzl", "build_external_workspace")
build_external_workspace(name = "vendor")
```

and this is the `BUILD` file

```bazel
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_test")

scala_library(
    name = "job",
    srcs = glob(["src/main/scala/**/*.scala"]),
    deps = [
        "@vendor//vendor/org/apache/flink:flink_clients",
        "@vendor//vendor/org/apache/flink:flink_scala",
        "@vendor//vendor/org/apache/flink:flink_streaming_scala",
    ]
)
```

I'm using [bazel-deps](https://github.com/johnynek/bazel-deps) for vendoring
the dependencies (put in the `vendor` folder). I have this on my
`dependencies.yaml` file:

```yaml
options:
  buildHeader: [
      "load(\"@io_bazel_rules_scala//scala:scala_import.bzl\",
\"scala_import\")",
      "load(\"@io_bazel_rules_scala//scala:scala.bzl\", \"scala_library\",
\"scala_binary\", \"scala_test\")",
  ]
  languages: [ "java", "scala:2.12.11" ]
  resolverType: "coursier"
  thirdPartyDirectory: "vendor"
  resolvers:
    - id: "mavencentral"
      type: "default"
      url: https://repo.maven.apache.org/maven2/
  strictVisibility: true
  transitivity: runtime_deps
  versionConflictPolicy: highest

dependencies:
  org.apache.flink:
    flink:
      lang: scala
      version: "1.11.2"
      modules: [clients, scala, streaming-scala] # provided
    flink-connector-kafka:
      lang: java
      version: "0.10.2"
    flink-test-utils:
      lang: java
      version: "0.10.2"
```

For downloading the dependencies, I'm running

```
bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala
--sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps
dependencies.yaml
```

Which runs just fine, but then when I try to build the project

```
bazel build //:job
```

I'm getting this error

```
Starting local Bazel server and connecting to it...
ERROR: Traceback (most recent call last):
        File "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line
44, column 25, in <toplevel>
                build_external_workspace(name = "vendor")
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 258, column 91, in build_external_workspace
                return build_external_workspace_from_opts(name = name, target_configs =
list_target_data(), separator = list_target_data_separator(), build_header =
build_header())
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 251, column 40, in list_target_data
                "vendor/org/apache/flink:flink_clients":
["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients","visibility||||||//visibility:public","kind||||||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],
Error: dictionary expression has duplicate key:
"vendor/org/apache/flink:flink_clients"
ERROR: error loading package 'external': Package 'external' contains errors
INFO: Elapsed time: 3.644s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
```

Why is that? Anyone can help? It would be great having detailed instructions
and project templates for Flink/Scala applications using Bazel. I've put
everything together in the following repo:
https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR
or whatever.

PS: Also posted in SO:
https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
Hi Matthias,

Thanks a lot for your reply. I am already aware of that reference, but it's
not exactly what I need. What I'd like to have is the typical word count
(hello world) app migrated from sbt to bazel, in order to use it as a
template for my Flink/Scala apps.





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

austin.ce
Hey Salva,

This appears to be a bug in the `bazel-deps` tool, caused by mixing scala and Java dependencies. The tool seems to use the same target name for both, and thus produces duplicate targets (one for scala and one for java).

If you look at the dict lines that are reported as conflicting, you'll see the duplicate "vendor/org/apache/flink:flink_clients" target:

        "vendor/org/apache/flink:flink_clients": ["lang||||||java","name||||||//vendor/org/apache/flink:flink_clients", ...],
        "vendor/org/apache/flink:flink_clients": ["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients", ...],

Can I ask what made you choose the `bazel-deps` too instead of the official bazelbuild/rules_jvm_external[1]? That might be a bit more verbose, but has better support and supports scala as well.


Alternatively, you might look into customizing the target templates for `bazel-deps` to suffix targets with the lang? Something like:

_JAVA_LIBRARY_TEMPLATE = """
java_library(
  name = "{name}_java",
..."""

_SCALA_IMPORT_TEMPLATE = """
scala_import(
    name = "{name}_scala",
..."""


Best,
Austin


On Mon, May 3, 2021 at 1:20 PM Salva Alcántara <[hidden email]> wrote:
Hi Matthias,

Thanks a lot for your reply. I am already aware of that reference, but it's
not exactly what I need. What I'd like to have is the typical word count
(hello world) app migrated from sbt to bazel, in order to use it as a
template for my Flink/Scala apps.





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
Hey Austin,

There was no special reason for vendoring using `bazel-deps`, really. I just
took another project as a reference for mine and that project was already
using `bazel-deps`. I am going to give `rules_jvm_external` a try, and
hopefully I can make it work!

Regards,

Salva



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

austin.ce
Great! Feel free to post back if you run into anything else or come up with a nice template – I agree it would be a nice thing for the community to have.

Best,
Austin

On Tue, May 4, 2021 at 12:37 AM Salva Alcántara <[hidden email]> wrote:
Hey Austin,

There was no special reason for vendoring using `bazel-deps`, really. I just
took another project as a reference for mine and that project was already
using `bazel-deps`. I am going to give `rules_jvm_external` a try, and
hopefully I can make it work!

Regards,

Salva



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
Hi Austin,

I followed your instructions and gave `rules_jvm_external` a try.

Overall, I think I advanced a bit, but I'm not quite there yet. I have
followed the link [1] given by Matthias, making the necessary changes to my
repo:

https://github.com/salvalcantara/bazel-flink-scala

In particular, the relevant (bazel) BUILD file looks like this:

```
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_test")

filegroup(
    name = "scala-main-srcs",
    srcs = glob(["*.scala"]),
)

scala_library(
    name = "flink_app",
    srcs = [":scala-main-srcs"],
    deps = [
        "@maven//:org_apache_flink_flink_core",
        "@maven//:org_apache_flink_flink_clients_2_12",
        "@maven//:org_apache_flink_flink_scala_2_12",
        "@maven//:org_apache_flink_flink_streaming_scala_2_12",
        "@maven//:org_apache_flink_flink_streaming_java_2_12",
    ],
)

java_binary(
    name = "word_count",
    srcs = ["//tools/flink:noop"],
    deploy_env = ["//:default_flink_deploy_env"],
    main_class = "org.example.WordCount",
    deps = [
        ":flink_app",
    ],
)
```

The idea is to use `deploy_env` within `java_binary` for providing the flink
dependencies. This causes those dependencies to get removed from the final
fat jar that one gets by running:

```
bazel build //src/main/scala/org/example:flink_app_deploy.jar
```

The problem now is that the jar still includes the Scala library, which
should also be dropped from the jar as it is part of the provided
dependencies within the Flink cluster. I am reading this blog post in [2]
without luck yet...

Regards,

Salva

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html

[2]
https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

austin.ce
Hi Salva,

I think you're almost there. Confusion is definitely not helped by the ADDONS/ PROVIDED_ADDONS thingy – I think I tried to get too fancy with that in the linked thread.

I think the only thing you have to do differently is to adjust the target you are building/ deploying – instead of `//src/main/scala/org/example:flink_app_deploy.jar`, your target with the provided env applied is `//src/main/scala/org/example:word_count_deploy.jar`. I've verified this in the following ways:

1. Building and checking the JAR itself
* bazel build //src/main/scala/org/example:word_count_deploy.jar
* jar -tf bazel-bin/src/main/scala/org/example/word_count_deploy.jar | grep flink
  * Shows only the tools/flink/NoOp class

2. Running the word count jar locally, to ensure the main class is picked up correctly:
./bazel-bin/src/main/scala/org/example/word_count      
USAGE:
WordCount <hostname> <port>

3. Had fun with the Bazel query language[1], inspecting the difference in the dependencies between the deploy env and the word_cound_deploy.jar:

bazel query 'filter("@maven//:org_apache_flink.*", deps(//src/main/scala/org/example:word_count_deploy.jar) except deps(//:default_flink_deploy_env))'
INFO: Empty results
Loading: 0 packages loaded

This is to say that there are no Flink dependencies in the deploy JAR that are not accounted for in the deploy env.


So I think you're all good, but let me know if I've misunderstood! Or if you find a better way of doing the provided deps – I'd be very interested!

Best,
Austin


On Wed, May 12, 2021 at 10:28 AM Salva Alcántara <[hidden email]> wrote:
Hi Austin,

I followed your instructions and gave `rules_jvm_external` a try.

Overall, I think I advanced a bit, but I'm not quite there yet. I have
followed the link [1] given by Matthias, making the necessary changes to my
repo:

https://github.com/salvalcantara/bazel-flink-scala

In particular, the relevant (bazel) BUILD file looks like this:

```
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_test")

filegroup(
    name = "scala-main-srcs",
    srcs = glob(["*.scala"]),
)

scala_library(
    name = "flink_app",
    srcs = [":scala-main-srcs"],
    deps = [
        "@maven//:org_apache_flink_flink_core",
        "@maven//:org_apache_flink_flink_clients_2_12",
        "@maven//:org_apache_flink_flink_scala_2_12",
        "@maven//:org_apache_flink_flink_streaming_scala_2_12",
        "@maven//:org_apache_flink_flink_streaming_java_2_12",
    ],
)

java_binary(
    name = "word_count",
    srcs = ["//tools/flink:noop"],
    deploy_env = ["//:default_flink_deploy_env"],
    main_class = "org.example.WordCount",
    deps = [
        ":flink_app",
    ],
)
```

The idea is to use `deploy_env` within `java_binary` for providing the flink
dependencies. This causes those dependencies to get removed from the final
fat jar that one gets by running:

```
bazel build //src/main/scala/org/example:flink_app_deploy.jar
```

The problem now is that the jar still includes the Scala library, which
should also be dropped from the jar as it is part of the provided
dependencies within the Flink cluster. I am reading this blog post in [2]
without luck yet...

Regards,

Salva

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html

[2]
https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
Hi Austin,

Yep, removing Flink dependencies is working well as you pointed out.

The problem now is that I would also need to remove the scala library...by
inspecting the jar you will see a lot of scala-related classes. If you take
a look at the end of the build.sbt file, I have

```
// exclude Scala library from assembly
assembly / assemblyOption  := (assembly /
assemblyOption).value.copy(includeScala = false)
```

so the fat jar generated by running `sbt assembly` does not contain
scala-related classes, which are also "provided". You can compare the
bazel-built jar with the one built by sbt

```
$ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
META-INF/MANIFEST.MF
org/
org/example/
BUILD
log4j.properties
org/example/WordCount$$anon$1$$anon$2.class
org/example/WordCount$$anon$1.class
org/example/WordCount$.class
org/example/WordCount.class
```

Note that there are neither Flink nor Scala classes. In the jar generated by
bazel, however, I can still see Scala classes...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

austin.ce
Yikes, I see what you mean. I also can not get `neverlink` or adding the org.scala.lang artifacts to the deploy_env to remove them from the uber jar.

I'm not super familiar with sbt/ scala, but do you know how exactly the assembly `includeScala` works? Is it just a flag that is passed to scalac?

I've found where rules_scala defines how to call `scalac`, but am lost here[1].

Best,
Austin


On Wed, May 12, 2021 at 3:20 PM Salva Alcántara <[hidden email]> wrote:
Hi Austin,

Yep, removing Flink dependencies is working well as you pointed out.

The problem now is that I would also need to remove the scala library...by
inspecting the jar you will see a lot of scala-related classes. If you take
a look at the end of the build.sbt file, I have

```
// exclude Scala library from assembly
assembly / assemblyOption  := (assembly /
assemblyOption).value.copy(includeScala = false)
```

so the fat jar generated by running `sbt assembly` does not contain
scala-related classes, which are also "provided". You can compare the
bazel-built jar with the one built by sbt

```
$ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
META-INF/MANIFEST.MF
org/
org/example/
BUILD
log4j.properties
org/example/WordCount$$anon$1$$anon$2.class
org/example/WordCount$$anon$1.class
org/example/WordCount$.class
org/example/WordCount.class
```

Note that there are neither Flink nor Scala classes. In the jar generated by
bazel, however, I can still see Scala classes...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

austin.ce
I know [hidden email] is using `rules_scala` for building Flink apps, perhaps he can help us out here (and hope he doesn't mind the ping).



On Wed, May 12, 2021 at 4:13 PM Austin Cawley-Edwards <[hidden email]> wrote:
Yikes, I see what you mean. I also can not get `neverlink` or adding the org.scala.lang artifacts to the deploy_env to remove them from the uber jar.

I'm not super familiar with sbt/ scala, but do you know how exactly the assembly `includeScala` works? Is it just a flag that is passed to scalac?

I've found where rules_scala defines how to call `scalac`, but am lost here[1].

Best,
Austin


On Wed, May 12, 2021 at 3:20 PM Salva Alcántara <[hidden email]> wrote:
Hi Austin,

Yep, removing Flink dependencies is working well as you pointed out.

The problem now is that I would also need to remove the scala library...by
inspecting the jar you will see a lot of scala-related classes. If you take
a look at the end of the build.sbt file, I have

```
// exclude Scala library from assembly
assembly / assemblyOption  := (assembly /
assemblyOption).value.copy(includeScala = false)
```

so the fat jar generated by running `sbt assembly` does not contain
scala-related classes, which are also "provided". You can compare the
bazel-built jar with the one built by sbt

```
$ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
META-INF/MANIFEST.MF
org/
org/example/
BUILD
log4j.properties
org/example/WordCount$$anon$1$$anon$2.class
org/example/WordCount$$anon$1.class
org/example/WordCount$.class
org/example/WordCount.class
```

Note that there are neither Flink nor Scala classes. In the jar generated by
bazel, however, I can still see Scala classes...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
That would be awesome Austin, thanks again for your help on that. In the
meantime, I also filled an issue in the `rules_scala` repo:
https://github.com/bazelbuild/rules_scala/issues/1268.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Setup of Scala/Flink project using Bazel

Salva Alcántara
In reply to this post by austin.ce
Hi Austin,

In the end I added the following target override for Scala:

```
maven_install(
    artifacts = [
        # testing
        maven.artifact(
            group = "com.google.truth",
            artifact = "truth",
            version = "1.0.1",
        ),
    ] + flink_artifacts(
        addons = FLINK_ADDONS,
        scala_version = FLINK_SCALA_VERSION,
        version = FLINK_VERSION,
    ) + flink_testing_artifacts(
        scala_version = FLINK_SCALA_VERSION,
        version = FLINK_VERSION,
    ),
    fetch_sources = True,
    # This override results in Scala-related classes being removed from the
deploy jar as required (?)
    override_targets = {
        "org.scala-lang.scala-library":
"@io_bazel_rules_scala_scala_library//:io_bazel_rules_scala_scala_library",
        "org.scala-lang.scala-reflect":
"@io_bazel_rules_scala_scala_reflect//:io_bazel_rules_scala_scala_reflect",
        "org.scala-lang.scala-compiler":
"@io_bazel_rules_scala_scala_compiler//:io_bazel_rules_scala_scala_compiler",
        "org.scala-lang.modules.scala-parser-combinators_%s" %
FLINK_SCALA_VERSION:
"@io_bazel_rules_scala_scala_parser_combinators//:io_bazel_rules_scala_scala_parser_combinators",
        "org.scala-lang.modules.scala-xml_%s" % FLINK_SCALA_VERSION:
"@io_bazel_rules_scala_scala_xml//:io_bazel_rules_scala_scala_xml",
    },
    repositories = MAVEN_REPOSITORIES,
)
```

and now it works as expected, meaning:

```
bazel build //src/main/scala/org/example:word_count_deploy.jar
```

produces a jar with both Flink and Scala-related classes removed (since they
are provided by the runtime). I did a quick check and the flink job runs
just fine in a local cluster. It would be nice if the community could
confirm that this is indeed the way to build flink-based scala
applications...

BTW I updated the repo with the abovementioned override:
https://github.com/salvalcantara/bazel-flink-scala in case you want to give
it a try




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/