Collection of files as input

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Collection of files as input

Camelia-Elena Ciolac
Hello,

I am working on a use case where we have a collections of files as input.
I am using the env.createInput based on AvroInputFormat. For one input file, it is fine to specify it in new Path(args[0]).
But, it is possible (and if yes, how)  to create a DataSet based on a collection of files directly? 

I thought of a workaround of building one DataSet dsUnion to be the union result,
                                                   and a second DataSet dsCurrent where we create an input for one file.

read first file in dsUnion

in a loop, repeat:
      read another file in dsCurrent
      dsUnion = dsUnion.union(dsCurrent)
until all files in the collection are processed.

Is there a simpler solution possible with Flink API?

Thanks in advance!
Camelia


Reply | Threaded
Open this post in threaded view
|

Re: Collection of files as input

Fabian Hueske
Hi Camelia,

FileInputFormats such as the AvroInputFormat can also read all files in a directory if this is specified as the path.

Hope that helps.

Best, Fabian

2014-10-24 12:08 GMT+02:00 Camelia-Elena Ciolac <[hidden email]>:
Hello,

I am working on a use case where we have a collections of files as input.
I am using the env.createInput based on AvroInputFormat. For one input file, it is fine to specify it in new Path(args[0]).
But, it is possible (and if yes, how)  to create a DataSet based on a collection of files directly? 

I thought of a workaround of building one DataSet dsUnion to be the union result,
                                                   and a second DataSet dsCurrent where we create an input for one file.

read first file in dsUnion

in a loop, repeat:
      read another file in dsCurrent
      dsUnion = dsUnion.union(dsCurrent)
until all files in the collection are processed.

Is there a simpler solution possible with Flink API?

Thanks in advance!
Camelia