Hello, We are using Flink to process input events and aggregate and write o/p of our streaming job to S3 using StreamingFileSink but whenever we try to restore the job from a savepoint, the restoration fails with missing part files error. As per my understanding, s3 deletes those part(intermittent) files and can no longer be found on s3. Is there a workaround for this, so that we can use s3 as a sink?
Thanks, Swapnil Kumar |
Hi Swapnil, I am not familiar with the StreamingFileSink, however, this sounds like a checkpointing issue to me FileSink should keep its sink state, and remove from the state the files that it really successfully sinks (perhaps you may want to add a validation here with S3 to check file integrity). This leaves us in the state with the failed files, partial files etc. --- Oytun Tez M O T A W O R D The World's Fastest Human Translation Platform. On Fri, Aug 16, 2019 at 6:02 PM Swapnil Kumar <[hidden email]> wrote:
|
Hi, S3 would delete files only if you have 'lifecycle rules' [1] defined on the bucket. Could that be the case? If so, make sure to disable / extend the object expiration period. Thanks, Rafi On Sat, Aug 17, 2019 at 1:48 AM Oytun Tez <[hidden email]> wrote:
|
My first guess would also be the same as Rafi's: The lifetime of the MPU part files is so too low for that use case. Maybe this can help: - If you want to stop a job with a savepoint and plan to restore later from it (possible much later, so that the MPU Part lifetime might be exceeded), then I would recommend to use Flink 1.9's new "stop with savepoint" feature. That should finalize in-flight uploads and make sure no lingering part files exist. - If you take a savepoint out of a running job to start a new job, you probably need to configure the sink differently anyways, to not interfere with the running job. In that case, I would suggest to change the name of the sink (the operator uid) such that the new job's sink doesn't try to resume (and interfere with) the running job's sink. Best, Stephan On Sat, Aug 17, 2019 at 11:23 PM Rafi Aroch <[hidden email]> wrote:
|
In reply to this post by Swapnil Kumar
Hi Swapnil, We faced this problem once, I think changing checkpoint dir to hdfs and keeping sink dir to s3 with EMRFS s3 consistency enabled solves this problem. If you are not using emr then I don't know how else it can be solved. But in a nutshell because EMRFS s3 consistency uses Dynamo DB in the back end to check for all files being written to s3. It kind of makes s3 consistent and Streaming file sink works just fine. On Sat, Aug 17, 2019, 3:32 AM Swapnil Kumar <[hidden email]> wrote:
|
Hello, could you tell us the version of flink-s3-fs-hadoop library that you are using ? On Sun 18 Aug 2019 at 16:24, taher koitawala <[hidden email]> wrote:
|
We used EMR version 5.20 which has Flink 1.6.2 and all other libraries were according to this version. So flink-s3-fs-hadoop was 1.6.2 as well. On Sun, Aug 18, 2019, 9:55 PM Ayush Verma <[hidden email]> wrote:
|
Hi, I would suggest you upgrade flink to 1.7.x and flink-s3-fs-hadoop to 1.7.2. You might be facing this issue:Kind regards Ayush Verma On Sun, Aug 18, 2019 at 6:02 PM taher koitawala <[hidden email]> wrote:
|
In reply to this post by Rafi Aroch
Hello Rafi, Thank you for getting back. We have lifecycle rule setup for the sink and not the s3 bucket for savepoints. This was my initial hunch too but we tried restarting the job immediately after canceling them and it failed. Best, Swapnil Kumar On Sat, Aug 17, 2019 at 2:23 PM Rafi Aroch <[hidden email]> wrote:
Thanks, Swapnil Kumar |
In reply to this post by taher koitawala-2
Thank you Taher, We are not on EMR but great to know that s3 and streaming sink should be working fine based on your explanation. On Sun, Aug 18, 2019 at 8:23 AM taher koitawala <[hidden email]> wrote:
Thanks, Swapnil Kumar |
In reply to this post by Stephan Ewen
We are on 1.8 as of now will give "stop with savepoint" a try once we upgrade. I am trying to cancel the job with savepoint and restore it back again. I think there is an issue with how our s3 lifecycle is configured. Thank you for your help. On Sun, Aug 18, 2019 at 8:10 AM Stephan Ewen <[hidden email]> wrote:
Thanks, Swapnil Kumar |
Free forum by Nabble | Edit this page |