Hi,
We noticed that every time an application runs, it uploads the flink-dist artifact to the /user/<user>/.flink HDFS directory. This causes a user disk space quota issue as we
submit thousands of apps to our cluster an hour. We had a similar problem with our Spark applications where it uploaded the Spark Assembly package for every app. Spark provides an argument to use a location in HDFS its for applications to leverage so they
don’t need to upload them for every run, and that was our solution (see “spark.yarn.jar” configuration if interested.)
Looking at the
Resource Orchestration Frameworks page, I see there’s might be a similar concept through a “yarn.flink-dist-jar” configuration option. I wanted to place the flink-dist package we’re using in a location in HDFS and configure out jobs to point to it, e.g.
yarn.flink-dist-jar: hdfs:////user/delp/.flink/flink-dist_2.11-1.9.1.jar
Am I correct in that this is what I’m looking for? I gave this a try with some jobs today, and based on what I’m seeing in the launch_container.sh in our YARN application, it
still looks like it’s being uploaded:
export _FLINK_JAR_PATH="hdfs://d279536/user/delp/.flink/application_1583031705852_117863/flink-dist_2.11-1.9.1.jar"
How can I confirm? Or is this perhaps not config I’m looking for?
Best,
Andreas
Free forum by Nabble | Edit this page |