StreamingFileSink only writes data to MINIO during savepoint

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

StreamingFileSink only writes data to MINIO during savepoint

Robert Cullen
On my kubernetes cluster when I set the StreamingFileSink to write to a local instance of S3 (MINIO - 500 GB) it only writes the data after I execute a savepoint

The expected behavior is to write the data in real-time. I'm guessing the memory requirements have not been met or a configuration in MINIO is missing?  Any ideas?

--
Robert Cullen
240-475-4490
Reply | Threaded
Open this post in threaded view
|

Re: StreamingFileSink only writes data to MINIO during savepoint

David Anderson-4
The StreamingFileSink requires that you have checkpointing enabled. I'm guessing that you don't have checkpointing enabled, since that would explain the behavior you are seeing.

The relevant section of the docs [1] explains:

Checkpointing needs to be enabled when using the StreamingFileSink. Part files can only be finalized on successful checkpoints. If checkpointing is disabled, part files will forever stay in the in-progress or the pending state, and cannot be safely read by downstream systems.

Regards,
David


On Fri, May 28, 2021 at 5:26 PM Robert Cullen <[hidden email]> wrote:
On my kubernetes cluster when I set the StreamingFileSink to write to a local instance of S3 (MINIO - 500 GB) it only writes the data after I execute a savepoint

The expected behavior is to write the data in real-time. I'm guessing the memory requirements have not been met or a configuration in MINIO is missing?  Any ideas?

--
Robert Cullen
240-475-4490