(DEPRECATED) Apache Flink User Mailing List archive.

Get file metadata

Classic

List

Threaded

5 messages Options

Ronny Bräunlich

Get file metadata

Hello,

I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment?

Cheers,
Ronny

rmetzger0

Re: Get file metadata

Hi Ronny,

check out this answer on SO: http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink

It is a similar use case ... I guess you can get the metadata from the input split as well.

On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich <[hidden email]> wrote:

Hello,

I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment?

Cheers,
Ronny

Ronny Bräunlich

Re: Get file metadata

Hi Robert,

thank you for your quick answer.

Just one additional question:

When I use the ExecutionEnvironment like this: DataSource<String> files = env.readTextFile("file:///Users/me/path/to/file/dir“);

Shouldn’t it read all the files in dir? I have three .json files there but when I print the result, nothing is shown.

Cheers,

Ronny

Am 01.07.2015 um 11:35 schrieb Robert Metzger <[hidden email]>:

Hi Ronny,

check out this answer on SO: http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink
It is a similar use case ... I guess you can get the metadata from the input split as well.

On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich <[hidden email]> wrote:
Hello,

I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment?

Cheers,
Ronny

Ronny Bräunlich

Re: Get file metadata

In reply to this post by rmetzger0

Hi Robert,

just ignore my previous question.

My files started with underscore and I just found out that FileInputFormat does filter for underscores in acceptFile().

Cheers,

Ronny

Am 01.07.2015 um 11:35 schrieb Robert Metzger <[hidden email]>:

Hi Ronny,

check out this answer on SO: http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink
It is a similar use case ... I guess you can get the metadata from the input split as well.

On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich <[hidden email]> wrote:
Hello,

I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment?

Cheers,
Ronny

rmetzger0

Re: Get file metadata

Okay. We filter files starting with underscores because that is the same behavior as Hadoop.

Hadoop is always creating some underscore files, so when reading results of a MapReduce job, Flink would read these files.

On Wed, Jul 1, 2015 at 12:15 PM, Ronny Bräunlich <[hidden email]> wrote:

Hi Robert,

just ignore my previous question.
My files started with underscore and I just found out that FileInputFormat does filter for underscores in acceptFile().

Cheers,
Ronny

Am 01.07.2015 um 11:35 schrieb Robert Metzger <[hidden email]>:

Hi Ronny,

check out this answer on SO: http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink
It is a similar use case ... I guess you can get the metadata from the input split as well.

On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich <[hidden email]> wrote:
Hello,

I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and the file content. Is there a way to do this with the ExecutionEnvironment?

Cheers,
Ronny