I am trying to setup a simple flink application from scratch using Bazel.
I've bootstrapped the project by running ``` sbt new tillrohrmann/flink-project.g8 ``` and after that I have added some files in order for Bazel to take control of the building (i.e., migrate from sbt). This is how the `WORKSPACE` looks like ``` # WORKSPACE load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") skylib_version = "1.0.3" http_archive( name = "bazel_skylib", sha256 = "1c531376ac7e5a180e0237938a2536de0c54d93f5c278634818e0efc952dd56c", type = "tar.gz", url = "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/{}/bazel-skylib-{}.tar.gz".format(skylib_version, skylib_version), ) rules_scala_version = "5df8033f752be64fbe2cedfd1bdbad56e2033b15" http_archive( name = "io_bazel_rules_scala", sha256 = "b7fa29db72408a972e6b6685d1bc17465b3108b620cb56d9b1700cf6f70f624a", strip_prefix = "rules_scala-%s" % rules_scala_version, type = "zip", url = "<a href="https://github.com/bazelbuild/rules_scala/archive/%s.zip">https://github.com/bazelbuild/rules_scala/archive/%s.zip" % rules_scala_version, ) # Stores Scala version and other configuration # 2.12 is a default version, other versions can be use by passing them explicitly: load("@io_bazel_rules_scala//:scala_config.bzl", "scala_config") scala_config(scala_version = "2.12.11") load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories") scala_repositories() load("@io_bazel_rules_scala//scala:toolchains.bzl", "scala_register_toolchains") scala_register_toolchains() load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_binary", "scala_test") # optional: setup ScalaTest toolchain and dependencies load("@io_bazel_rules_scala//testing:scalatest.bzl", "scalatest_repositories", "scalatest_toolchain") scalatest_repositories() scalatest_toolchain() load("//vendor:workspace.bzl", "maven_dependencies") maven_dependencies() load("//vendor:target_file.bzl", "build_external_workspace") build_external_workspace(name = "vendor") ``` and this is the `BUILD` file ```bazel package(default_visibility = ["//visibility:public"]) load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_test") scala_library( name = "job", srcs = glob(["src/main/scala/**/*.scala"]), deps = [ "@vendor//vendor/org/apache/flink:flink_clients", "@vendor//vendor/org/apache/flink:flink_scala", "@vendor//vendor/org/apache/flink:flink_streaming_scala", ] ) ``` I'm using [bazel-deps](https://github.com/johnynek/bazel-deps) for vendoring the dependencies (put in the `vendor` folder). I have this on my `dependencies.yaml` file: ```yaml options: buildHeader: [ "load(\"@io_bazel_rules_scala//scala:scala_import.bzl\", \"scala_import\")", "load(\"@io_bazel_rules_scala//scala:scala.bzl\", \"scala_library\", \"scala_binary\", \"scala_test\")", ] languages: [ "java", "scala:2.12.11" ] resolverType: "coursier" thirdPartyDirectory: "vendor" resolvers: - id: "mavencentral" type: "default" url: https://repo.maven.apache.org/maven2/ strictVisibility: true transitivity: runtime_deps versionConflictPolicy: highest dependencies: org.apache.flink: flink: lang: scala version: "1.11.2" modules: [clients, scala, streaming-scala] # provided flink-connector-kafka: lang: java version: "0.10.2" flink-test-utils: lang: java version: "0.10.2" ``` For downloading the dependencies, I'm running ``` bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala --sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps dependencies.yaml ``` Which runs just fine, but then when I try to build the project ``` bazel build //:job ``` I'm getting this error ``` Starting local Bazel server and connecting to it... ERROR: Traceback (most recent call last): File "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line 44, column 25, in <toplevel> build_external_workspace(name = "vendor") File "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", line 258, column 91, in build_external_workspace return build_external_workspace_from_opts(name = name, target_configs = list_target_data(), separator = list_target_data_separator(), build_header = build_header()) File "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl", line 251, column 40, in list_target_data "vendor/org/apache/flink:flink_clients": ["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients","visibility||||||//visibility:public","kind||||||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"], Error: dictionary expression has duplicate key: "vendor/org/apache/flink:flink_clients" ERROR: error loading package 'external': Package 'external' contains errors INFO: Elapsed time: 3.644s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded) ``` Why is that? Anyone can help? It would be great having detailed instructions and project templates for Flink/Scala applications using Bazel. I've put everything together in the following repo: https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR or whatever. PS: Also posted in SO: https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Salva, unfortunately, I have no experience with Bazel. Just by looking at the code you shared I cannot come up with an answer either. Have you checked out the ML thread in [1]? It provides two other examples where users used Bazel in the context of Flink. This might give you some hints on where to look. Sorry for not being more helpful. Best, Matthias On Fri, Apr 30, 2021 at 11:57 AM Salva Alcántara <[hidden email]> wrote: I am trying to setup a simple flink application from scratch using Bazel. |
Hi Matthias,
Thanks a lot for your reply. I am already aware of that reference, but it's not exactly what I need. What I'd like to have is the typical word count (hello world) app migrated from sbt to bazel, in order to use it as a template for my Flink/Scala apps. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hey Salva, This appears to be a bug in the `bazel-deps` tool, caused by mixing scala and Java dependencies. The tool seems to use the same target name for both, and thus produces duplicate targets (one for scala and one for java). If you look at the dict lines that are reported as conflicting, you'll see the duplicate "vendor/org/apache/flink:flink_clients" target: "vendor/org/apache/flink:flink_clients": ["lang||||||java","name||||||//vendor/org/apache/flink:flink_clients", ...], "vendor/org/apache/flink:flink_clients": ["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients", ...], Can I ask what made you choose the `bazel-deps` too instead of the official bazelbuild/rules_jvm_external[1]? That might be a bit more verbose, but has better support and supports scala as well. Alternatively, you might look into customizing the target templates for `bazel-deps` to suffix targets with the lang? Something like: _JAVA_LIBRARY_TEMPLATE = """ java_library( name = "{name}_java", ...""" _SCALA_IMPORT_TEMPLATE = """ scala_import( name = "{name}_scala", ...""" Best, Austin On Mon, May 3, 2021 at 1:20 PM Salva Alcántara <[hidden email]> wrote: Hi Matthias, |
Hey Austin,
There was no special reason for vendoring using `bazel-deps`, really. I just took another project as a reference for mine and that project was already using `bazel-deps`. I am going to give `rules_jvm_external` a try, and hopefully I can make it work! Regards, Salva -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Great! Feel free to post back if you run into anything else or come up with a nice template – I agree it would be a nice thing for the community to have. Best, Austin On Tue, May 4, 2021 at 12:37 AM Salva Alcántara <[hidden email]> wrote: Hey Austin, |
Hi Austin,
I followed your instructions and gave `rules_jvm_external` a try. Overall, I think I advanced a bit, but I'm not quite there yet. I have followed the link [1] given by Matthias, making the necessary changes to my repo: https://github.com/salvalcantara/bazel-flink-scala In particular, the relevant (bazel) BUILD file looks like this: ``` package(default_visibility = ["//visibility:public"]) load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library", "scala_test") filegroup( name = "scala-main-srcs", srcs = glob(["*.scala"]), ) scala_library( name = "flink_app", srcs = [":scala-main-srcs"], deps = [ "@maven//:org_apache_flink_flink_core", "@maven//:org_apache_flink_flink_clients_2_12", "@maven//:org_apache_flink_flink_scala_2_12", "@maven//:org_apache_flink_flink_streaming_scala_2_12", "@maven//:org_apache_flink_flink_streaming_java_2_12", ], ) java_binary( name = "word_count", srcs = ["//tools/flink:noop"], deploy_env = ["//:default_flink_deploy_env"], main_class = "org.example.WordCount", deps = [ ":flink_app", ], ) ``` The idea is to use `deploy_env` within `java_binary` for providing the flink dependencies. This causes those dependencies to get removed from the final fat jar that one gets by running: ``` bazel build //src/main/scala/org/example:flink_app_deploy.jar ``` The problem now is that the jar still includes the Scala library, which should also be dropped from the jar as it is part of the provided dependencies within the Flink cluster. I am reading this blog post in [2] without luck yet... Regards, Salva [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html [2] https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/ -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Salva, I think you're almost there. Confusion is definitely not helped by the ADDONS/ PROVIDED_ADDONS thingy – I think I tried to get too fancy with that in the linked thread. I think the only thing you have to do differently is to adjust the target you are building/ deploying – instead of `//src/main/scala/org/example:flink_app_deploy.jar`, your target with the provided env applied is `//src/main/scala/org/example:word_count_deploy.jar`. I've verified this in the following ways: 1. Building and checking the JAR itself * bazel build //src/main/scala/org/example:word_count_deploy.jar * jar -tf bazel-bin/src/main/scala/org/example/word_count_deploy.jar | grep flink * Shows only the tools/flink/NoOp class 2. Running the word count jar locally, to ensure the main class is picked up correctly: ./bazel-bin/src/main/scala/org/example/word_count USAGE: WordCount <hostname> <port> 3. Had fun with the Bazel query language[1], inspecting the difference in the dependencies between the deploy env and the word_cound_deploy.jar: bazel query 'filter("@maven//:org_apache_flink.*", deps(//src/main/scala/org/example:word_count_deploy.jar) except deps(//:default_flink_deploy_env))' INFO: Empty results Loading: 0 packages loaded This is to say that there are no Flink dependencies in the deploy JAR that are not accounted for in the deploy env. So I think you're all good, but let me know if I've misunderstood! Or if you find a better way of doing the provided deps – I'd be very interested! Best, Austin On Wed, May 12, 2021 at 10:28 AM Salva Alcántara <[hidden email]> wrote: Hi Austin, |
Hi Austin,
Yep, removing Flink dependencies is working well as you pointed out. The problem now is that I would also need to remove the scala library...by inspecting the jar you will see a lot of scala-related classes. If you take a look at the end of the build.sbt file, I have ``` // exclude Scala library from assembly assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false) ``` so the fat jar generated by running `sbt assembly` does not contain scala-related classes, which are also "provided". You can compare the bazel-built jar with the one built by sbt ``` $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar META-INF/MANIFEST.MF org/ org/example/ BUILD log4j.properties org/example/WordCount$$anon$1$$anon$2.class org/example/WordCount$$anon$1.class org/example/WordCount$.class org/example/WordCount.class ``` Note that there are neither Flink nor Scala classes. In the jar generated by bazel, however, I can still see Scala classes... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Yikes, I see what you mean. I also can not get `neverlink` or adding the org.scala.lang artifacts to the deploy_env to remove them from the uber jar. I'm not super familiar with sbt/ scala, but do you know how exactly the assembly `includeScala` works? Is it just a flag that is passed to scalac? I've found where rules_scala defines how to call `scalac`, but am lost here[1]. Best, Austin On Wed, May 12, 2021 at 3:20 PM Salva Alcántara <[hidden email]> wrote: Hi Austin, |
I know [hidden email] is using `rules_scala` for building Flink apps, perhaps he can help us out here (and hope he doesn't mind the ping). On Wed, May 12, 2021 at 4:13 PM Austin Cawley-Edwards <[hidden email]> wrote:
|
That would be awesome Austin, thanks again for your help on that. In the
meantime, I also filled an issue in the `rules_scala` repo: https://github.com/bazelbuild/rules_scala/issues/1268. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In reply to this post by austin.ce
Hi Austin,
In the end I added the following target override for Scala: ``` maven_install( artifacts = [ # testing maven.artifact( group = "com.google.truth", artifact = "truth", version = "1.0.1", ), ] + flink_artifacts( addons = FLINK_ADDONS, scala_version = FLINK_SCALA_VERSION, version = FLINK_VERSION, ) + flink_testing_artifacts( scala_version = FLINK_SCALA_VERSION, version = FLINK_VERSION, ), fetch_sources = True, # This override results in Scala-related classes being removed from the deploy jar as required (?) override_targets = { "org.scala-lang.scala-library": "@io_bazel_rules_scala_scala_library//:io_bazel_rules_scala_scala_library", "org.scala-lang.scala-reflect": "@io_bazel_rules_scala_scala_reflect//:io_bazel_rules_scala_scala_reflect", "org.scala-lang.scala-compiler": "@io_bazel_rules_scala_scala_compiler//:io_bazel_rules_scala_scala_compiler", "org.scala-lang.modules.scala-parser-combinators_%s" % FLINK_SCALA_VERSION: "@io_bazel_rules_scala_scala_parser_combinators//:io_bazel_rules_scala_scala_parser_combinators", "org.scala-lang.modules.scala-xml_%s" % FLINK_SCALA_VERSION: "@io_bazel_rules_scala_scala_xml//:io_bazel_rules_scala_scala_xml", }, repositories = MAVEN_REPOSITORIES, ) ``` and now it works as expected, meaning: ``` bazel build //src/main/scala/org/example:word_count_deploy.jar ``` produces a jar with both Flink and Scala-related classes removed (since they are provided by the runtime). I did a quick check and the flink job runs just fine in a local cluster. It would be nice if the community could confirm that this is indeed the way to build flink-based scala applications... BTW I updated the repo with the abovementioned override: https://github.com/salvalcantara/bazel-flink-scala in case you want to give it a try -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |