Is there a way to get file "metadata" as part of stream?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to get file "metadata" as part of stream?

John Smith
Hi, so reading a CSV file using env.readFile() with RowCsvInputFormat.

Is there a way to get the filename as part of the row stream?

The file contains a unique identifier to tag the rows with.
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get file "metadata" as part of stream?

Till Rohrmann
Hi John,

out of the box, Flink does not provide this functionality. However, you might be able to write your own CsvInputFormat which overrides fillRecord so that it generates a CSV record where the first field contains the filename. You can obtain the filename from the field currentSplit. I haven't tried it out myself, though.

Cheers,
Till

On Fri, Jul 31, 2020 at 5:54 PM John Smith <[hidden email]> wrote:
Hi, so reading a CSV file using env.readFile() with RowCsvInputFormat.

Is there a way to get the filename as part of the row stream?

The file contains a unique identifier to tag the rows with.