readFile - Continuous file processing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

readFile - Continuous file processing

Nancy Estrada
Hi guys,

I have the following use case. Every day a new file is created and periodically some log records are appended to it. I am reading the file in the following way:

executionEnvironment.readFile(format, directoryPath, PROCESS_CONTINUOUSLY, period.toMilliseconds(),filePathFilter);

However, Flink takes modified files as new files and consequently all the content of the modified file gets processed again. I know that a solution is to process the file until it contains all the records of the day but I will like to process the file continuously. Therefore, I am wondering if there is a way of processing just the new records in a file?

Thank you in advance! :)
Nancy



Reply | Threaded
Open this post in threaded view
|

Re: readFile - Continuous file processing

Kostas Kloudas
Hi Nancy,

Currently there is no way to do so. Flink only provides the mode you described, i.e.
a modified file is considered a new file. The reason is that many filesystems do not
give you separate creation from modification timestamps.

If you control the way files are created, a solution could be to just write each time to a different file.

Thanks,
Kostas


> On Jan 31, 2017, at 6:17 PM, Nancy Estrada <[hidden email]> wrote:
>
> Hi guys,
>
> I have the following use case. Every day a new file is created and
> periodically some log records are appended to it. I am reading the file in
> the following way:
>
> executionEnvironment.readFile(format, directoryPath, PROCESS_CONTINUOUSLY,
> period.toMilliseconds(),filePathFilter);
>
> However, Flink takes modified files as new files and consequently all the
> content of the modified file gets processed again. I know that a solution is
> to process the file until it contains all the records of the day but I will
> like to process the file continuously. Therefore, I am wondering if there is
> a way of processing just the new records in a file?
>
> Thank you in advance! :)
> Nancy
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/readFile-Continuous-file-processing-tp11384.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.