Flink on YARN - tmp directory

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on YARN - tmp directory

Chris Hebert
Hi,
 
My jobs create tmp files like so:

java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile", "txt");

They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/.

The Flink on YARN docs say:
Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address (because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and parallelism.default if the number of slots has been specified.
How would I specify a different tmp directory for a job without modifying my YARN tmp directories?

I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway, that failed.

I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.

I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to ./bin/yarn-session.sh and ./bin/flink run et cetera, that failed.

Odd observation:
The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/, yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.

Side note:
My Flink job is a Beam pipeline. I doubt that's relevant, but let me know if it is.

Thanks,
Chris
Reply | Threaded
Open this post in threaded view
|

Re: Flink on YARN - tmp directory

Chris Hebert
I should also note that the above steps did get the Flink JobManager and TaskManagers to save their tmp web dashboard files to /my/tmp/ and to show in the Dashboard that the taskmanager.tmp.dirs property had been properly set to /my/tmp/, but the tmp files I wrote in my jobs stubbornly wrote to /tmp/ anyway.

On Fri, Jul 28, 2017 at 4:55 PM, Chris Hebert <[hidden email]> wrote:
Hi,
 
My jobs create tmp files like so:

java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile", "txt");

They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/.

The Flink on YARN docs say:
Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address (because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and parallelism.default if the number of slots has been specified.
How would I specify a different tmp directory for a job without modifying my YARN tmp directories?

I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway, that failed.

I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.

I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to ./bin/yarn-session.sh and ./bin/flink run et cetera, that failed.

Odd observation:
The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/, yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.

Side note:
My Flink job is a Beam pipeline. I doubt that's relevant, but let me know if it is.

Thanks,
Chris

Reply | Threaded
Open this post in threaded view
|

Re: Flink on YARN - tmp directory

Aljoscha Krettek
Hi Chris,

I think in this case we need to change what is passed as "-Djava.io.tmpdir" to the JVMs that run the TaskManagers. You should be able to achieve this via env.java.opts or more specifically env.java.opts.taskmanager [1]. The directory specified via task taskmanager.tmp.dirs is only used to set the internal Flink tmp directories but doesn't change what Java assumes as the tmp directory. You should be able to change that setting in the flink-conf.yaml or pass it as a "dynamic property" when running via bin/flink (per-job YARN cluster) or when creating the YARN session. For example:

bin/flink ... -Denv.java.opts.taskmanager="-Djava.io.tmpdir=/my/tmp" ...

Best,
Aljoscha

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#common-options
On 29. Jul 2017, at 00:00, Chris Hebert <[hidden email]> wrote:

I should also note that the above steps did get the Flink JobManager and TaskManagers to save their tmp web dashboard files to /my/tmp/ and to show in the Dashboard that the taskmanager.tmp.dirs property had been properly set to /my/tmp/, but the tmp files I wrote in my jobs stubbornly wrote to /tmp/ anyway.

On Fri, Jul 28, 2017 at 4:55 PM, Chris Hebert <[hidden email]> wrote:
Hi,
 
My jobs create tmp files like so:

java.nio.file.Path tmpFilePath = java.nio.file.Files.createTempFile("tmpFile", "txt");

They currently appear in /tmp/, but I want them somewhere else, say /my/tmp/.

The Flink on YARN docs say:
Flink on YARN will overwrite the following configuration parameters jobmanager.rpc.address (because the JobManager is always allocated at different machines), taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and parallelism.default if the number of slots has been specified.
How would I specify a different tmp directory for a job without modifying my YARN tmp directories?

I tried the taskmanager.tmp.dirs property in conf/flink-conf.yaml anyway, that failed.

I appended -Djava.io.tmpdir=/my/tmp/ to JVM_ARGS and all three variations of DEFAULT_ENV_JAVA_OPTS in bin/config.sh, that failed.

I passed -Djava.io.tmpdir=/my/tmp/ and variations as arguments to ./bin/yarn-session.sh and ./bin/flink run et cetera, that failed.

Odd observation:
The hadoop.tmp.dir property is set in my core-site.xml to /some/other/tmp/, yet Flink writes to /tmp/. My yarn-site.xml specifies no tmp.

Side note:
My Flink job is a Beam pipeline. I doubt that's relevant, but let me know if it is.

Thanks,
Chris