Re: Checking for existance of output directory/files before running a batch job

Posted by rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Checking-for-existance-of-output-directory-files-before-running-a-batch-job-tp8573p8584.html

Ooops. Looks like Google Mail / Apache / the internet needs 13 minutes to deliver an email.
Sorry for double answering.

On Fri, Aug 19, 2016 at 3:07 PM, Maximilian Michels <[hidden email]> wrote:
HI Niels,

Have you tried specifying the fully-qualified path? The default is the
local file system.

For example, hdfs:///path/to/foo

If that doesn't work, do you have the same Hadoop configuration on the
machine where you test?

Cheers,
Max

On Thu, Aug 18, 2016 at 2:02 PM, Niels Basjes <[hidden email]> wrote:
> Hi,
>
> I have a batch job that I run on yarn that creates files in HDFS.
> I want to avoid running this job at all if the output already exists.
>
> So in my code (before submitting the job into yarn-session) I do this:
>
>     String directory = "foo";
>
>     Path directory = new Path(directoryName);
>     FileSystem fs = directory.getFileSystem();
>
>     if (!fs.exists(directory)) {
>
>         // run the job
>
>     }
>
> What I found is that this code apparently checks the 'wrong' file system. (I
> always get 'false' even if it exists in hdfs)
>
> I checked the API of the execution environment yet I was unable to get the
> 'correct' filesystem from there.
>
> What is the proper way to check this?
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes