Hello all, I want to run a Flink log processing job and my input is stored locally in a nested directory structure, like the following: logs_dir/ |-----/machine1/ |-----------/january.log |-----------/february.log ... |-----/machine2/ ... etc. When providing "logs_dir" as the argument to readTextFile(), nothing is read and no an exception or error is returned. Copying the nested individual files machine1/january.log, machine1/february.log, ..., to the same directory works fine, but I was wondering whether there is a better way to do this? Thank you! V. |
Hi! Not right now. The input formats do not recursively enumerate files. In that, we followed the way Hadoop did it. If that is something that is interesting, it should not be too hard to add to the FileInputFormat an option to do a complete recursive traversal of the directory structure. Greetings, Stephan On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri <[hidden email]> wrote:
|
Hi, thanks for replying! It would certainly be useful for my use case, but not absolutely necessary. If you think other people might find it useful too, I can open a issue. If not, I believe it would be nice to print a warning when a nested directory is given as input path, since now, the files that are in the base directory are normally processed, but the nested ones are simply ignored. Cheers, V. On 2 December 2014 at 16:52, Stephan Ewen <[hidden email]> wrote:
|
+1 for adding such a feature. It should be very easy to implement (basically extend the createInputSplits() method) On Tue, Dec 2, 2014 at 5:22 PM, Vasiliki Kalavri <[hidden email]> wrote:
|
+1 I find this useful as well.
On 04 Dec 2014, at 22:02, Robert Metzger <[hidden email]> wrote: > +1 for adding such a feature. It should be very easy to implement (basically extend the createInputSplits() method) > > On Tue, Dec 2, 2014 at 5:22 PM, Vasiliki Kalavri <[hidden email]> wrote: > Hi, > > thanks for replying! > > It would certainly be useful for my use case, but not absolutely necessary. If you think other people might find it useful too, I can open a issue. > If not, I believe it would be nice to print a warning when a nested directory is given as input path, > since now, the files that are in the base directory are normally processed, but the nested ones are simply ignored. > > Cheers, > V. > > On 2 December 2014 at 16:52, Stephan Ewen <[hidden email]> wrote: > Hi! > > Not right now. The input formats do not recursively enumerate files. In that, we followed the way Hadoop did it. > > If that is something that is interesting, it should not be too hard to add to the FileInputFormat an option to do a complete recursive traversal of the directory structure. > > Greetings, > Stephan > > > On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri <[hidden email]> wrote: > Hello all, > > I want to run a Flink log processing job and my input is stored locally in a nested directory structure, like the following: > > logs_dir/ > |-----/machine1/ > |-----------/january.log > |-----------/february.log > ... > |-----/machine2/ > ... > > etc. > > When providing "logs_dir" as the argument to readTextFile(), nothing is read and no an exception or error is returned. > Copying the nested individual files machine1/january.log, machine1/february.log, ..., to the same directory works fine, but I was wondering whether there is a better way to do this? > > Thank you! > V. > > > |
Free forum by Nabble | Edit this page |