Hello,
I am working on a use case where we have a collections of files as input.
I am using the env.createInput based on AvroInputFormat. For one input file, it is fine to specify it in new Path(args[0]).
But, it is possible (and if yes, how) to create a DataSet based on a collection of files directly?
I thought of a workaround of building one DataSet dsUnion to be the union result,
and a second DataSet dsCurrent where we create an input for one file.
read first file in dsUnion
in a loop, repeat:
read another file in dsCurrent
dsUnion = dsUnion.union(dsCurrent)
until all files in the collection are processed.
Is there a simpler solution possible with Flink API?
Thanks in advance!
Camelia