Hey, I am new to flink and I have a question and want to see if anyone can help here. So we have a s3 path that flink is monitoring that path to see new files available. val avroInputStream_activity = env.readFile(format, path, FileProcessingMode.PROCESS_ I am doing both internal and external check pointing and let's say there is a bad file came to the path and flink will do several retries. I want to take those bad files and let the process continue. However, since the file path persist in the checkpoint, when I try to resume from external checkpoint, it threw the following error on no file been found. java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No such file or directory: s3a://myfile Is there a way to skip this bad file and move on? Thanks in advance. Best, Chengzhi Zhao |
Hi Chengzhi Zhao, I think this is rather an issue with the ContinuousFileReaderOperator than with the checkpointing algorithm in general.It could for example check if a path exists and before trying to read a file and ignore the input split instead of throwing an exception and causing a failure. If you want to, you can also work on a fix and contribute it back.2018-02-06 19:15 GMT+01:00 Chengzhi Zhao <[hidden email]>:
|
Thanks, Fabian,
I opened an JIRA ticket and I'd like to work on it if people think this would be a improvement: Best, Chengzhi On Wed, Feb 7, 2018 at 4:17 AM, Fabian Hueske <[hidden email]> wrote:
|
Great, thank you! Best, Fabian2018-02-07 23:52 GMT+01:00 Chengzhi Zhao <[hidden email]>:
|
Free forum by Nabble | Edit this page |