HDFS directory rename

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

HDFS directory rename

Flavio Pompermaier
Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Fabian Hueske-2
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Flavio Pompermaier
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio



Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Stephan Ewen
I don't think there is a simpler way to do this.

Flink follows the semantics of the Hadoop's HDFS file system there, which behaves that way, and the Java File class.

But it seems your solution is working, even if it needs a few extra lines of code.

On Fri, Jul 17, 2015 at 11:17 AM, Flavio Pompermaier <[hidden email]> wrote:
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio




Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Flavio Pompermaier
Ok. What I still not able to do is to recursively remove empty dirs from the source dir because there's no API for getChildrenCount() or getChildren() for a given Path.
How can I do that?

On Tue, Jul 21, 2015 at 3:13 PM, Stephan Ewen <[hidden email]> wrote:
I don't think there is a simpler way to do this.

Flink follows the semantics of the Hadoop's HDFS file system there, which behaves that way, and the Java File class.

But it seems your solution is working, even if it needs a few extra lines of code.

On Fri, Jul 17, 2015 at 11:17 AM, Flavio Pompermaier <[hidden email]> wrote:
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio





Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Fabian Hueske-2
How about FileStatus[] FileSystem.listStatus()?
FileStatus gives the length of a file, the path, whether it's a dir, etc.

2015-07-22 11:04 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Ok. What I still not able to do is to recursively remove empty dirs from the source dir because there's no API for getChildrenCount() or getChildren() for a given Path.
How can I do that?

On Tue, Jul 21, 2015 at 3:13 PM, Stephan Ewen <[hidden email]> wrote:
I don't think there is a simpler way to do this.

Flink follows the semantics of the Hadoop's HDFS file system there, which behaves that way, and the Java File class.

But it seems your solution is working, even if it needs a few extra lines of code.

On Fri, Jul 17, 2015 at 11:17 AM, Flavio Pompermaier <[hidden email]> wrote:
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio






Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Flavio Pompermaier
I can detect if it's a dir but how can I detect if it's empty?

On Wed, Jul 22, 2015 at 12:49 PM, Fabian Hueske <[hidden email]> wrote:
How about FileStatus[] FileSystem.listStatus()?
FileStatus gives the length of a file, the path, whether it's a dir, etc.

2015-07-22 11:04 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Ok. What I still not able to do is to recursively remove empty dirs from the source dir because there's no API for getChildrenCount() or getChildren() for a given Path.
How can I do that?

On Tue, Jul 21, 2015 at 3:13 PM, Stephan Ewen <[hidden email]> wrote:
I don't think there is a simpler way to do this.

Flink follows the semantics of the Hadoop's HDFS file system there, which behaves that way, and the Java File class.

But it seems your solution is working, even if it needs a few extra lines of code.

On Fri, Jul 17, 2015 at 11:17 AM, Flavio Pompermaier <[hidden email]> wrote:
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio









--

Flavio Pompermaier
Development Department
_______________________________________________
OKKAMSrl www.okkam.it

Phone: +(39) 0461 283 702
Fax: + (39) 0461 186 6433
Email: [hidden email]
Headquarters: Trento (Italy), via G.B. Trener 8
Registered office: Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally privileged and/or confidential information. Please do not read it if you are not the intended recipient(S). Any use, distribution, reproduction or disclosure by any other person is strictly prohibited. If you have received this e-mail in error, please notify the sender and destroy the original transmission and its attachments without reading or saving it in any manner.

Reply | Threaded
Open this post in threaded view
|

Re: HDFS directory rename

Fabian Hueske-2

listStatus() should return an empty array

On Jul 22, 2015 13:11, "Flavio Pompermaier" <[hidden email]> wrote:
I can detect if it's a dir but how can I detect if it's empty?

On Wed, Jul 22, 2015 at 12:49 PM, Fabian Hueske <[hidden email]> wrote:
How about FileStatus[] FileSystem.listStatus()?
FileStatus gives the length of a file, the path, whether it's a dir, etc.

2015-07-22 11:04 GMT+02:00 Flavio Pompermaier <[hidden email]>:
Ok. What I still not able to do is to recursively remove empty dirs from the source dir because there's no API for getChildrenCount() or getChildren() for a given Path.
How can I do that?

On Tue, Jul 21, 2015 at 3:13 PM, Stephan Ewen <[hidden email]> wrote:
I don't think there is a simpler way to do this.

Flink follows the semantics of the Hadoop's HDFS file system there, which behaves that way, and the Java File class.

But it seems your solution is working, even if it needs a few extra lines of code.

On Fri, Jul 17, 2015 at 11:17 AM, Flavio Pompermaier <[hidden email]> wrote:
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in another folder.
The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir.
I wanted to use the Flink client because is FS independent, so I can choose which FS to use at runtime.
At the moment what I do is:

Path dataSourceArchivePath = new Path(rowChunksArchiveBaseDir, dataSourceId);
dataSourceArchivePath.getFileSystem().mkdirs(dataSourceArchivePath.getParent());
boolean moved = dataSourceArchivePath.getFileSystem().rename(dataSourceDirPath, dataSourceArchivePath.getParent());
LOG.info("Archiving {} to {} {}", dataSourceDirPath,dataSourceArchivePath, moved ? "successful" : "failed");

Moreover I still have to delete the empty subPaths of the dataSourceArchivePath after the move but I can't do that because there's no listChildren() on the Path object :(
I was looking for a simpler way to do this. Does it exists?

On Fri, Jul 17, 2015 at 10:08 AM, <[hidden email]> wrote:
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed.

Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client?

From: [hidden email]
Sent: ‎Friday‎, ‎17‎. ‎July‎, ‎2015 ‎10‎:‎02
To: [hidden email]

Hi to all,

in my Flink job I wanted to move a folder (containing other folders and files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks like:

myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y

I tried to use rename but it silently fails (rename just returns false) if the parent directory doesn't exists.
Is there an easy way to do that with the Flink FS apis?
If the rename() is intended to work that way, couldn't be useful a move() API..?

Best,
Flavio









--

Flavio Pompermaier
Development Department
_______________________________________________
OKKAMSrl www.okkam.it

Phone: +(39) 0461 283 702
Fax: + (39) 0461 186 6433
Email: [hidden email]
Headquarters: Trento (Italy), via G.B. Trener 8
Registered office: Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally privileged and/or confidential information. Please do not read it if you are not the intended recipient(S). Any use, distribution, reproduction or disclosure by any other person is strictly prohibited. If you have received this e-mail in error, please notify the sender and destroy the original transmission and its attachments without reading or saving it in any manner.