Hello,
I want to read a file containing textfiles with Flink. As I already found out I can simply point the environment to the directory and it will read all the files. What I couldn’t find out is if it’s possible to keep the file metadata somehow. Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment? Cheers, Ronny |
Hi Ronny, check out this answer on SO: http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink It is a similar use case ... I guess you can get the metadata from the input split as well. On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich <[hidden email]> wrote: Hello, |
Hi Robert,
thank you for your quick answer. Just one additional question: When I use the ExecutionEnvironment like this: DataSource<String> files = env.readTextFile("file:///Users/me/path/to/file/dir“); Shouldn’t it read all the files in dir? I have three .json files there but when I print the result, nothing is shown. Cheers, Ronny Am 01.07.2015 um 11:35 schrieb Robert Metzger <[hidden email]>:
|
In reply to this post by rmetzger0
Hi Robert,
just ignore my previous question. My files started with underscore and I just found out that FileInputFormat does filter for underscores in acceptFile(). Cheers, Ronny Am 01.07.2015 um 11:35 schrieb Robert Metzger <[hidden email]>:
|
Okay. We filter files starting with underscores because that is the same behavior as Hadoop. Hadoop is always creating some underscore files, so when reading results of a MapReduce job, Flink would read these files. On Wed, Jul 1, 2015 at 12:15 PM, Ronny Bräunlich <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |