CSV StreamingFileSink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CSV StreamingFileSink

austin.ce
Hey all,

Has anyone had success using the StreamingFileSink[1] to write CSV files? And if so, what about compressed (Gzipped, ideally) files/ which libraries did you use?


Best,
Austin


Reply | Threaded
Open this post in threaded view
|

Re: CSV StreamingFileSink

austin.ce
Following up on this -- does anyone know if it's possible to stream individual files to a directory using the StreamingFileSink? For instance, if I want all records that come in during a certain day to be partitioned into daily directories:

2020-02-18/
   large-file-1.txt
   large-file-2.txt
2020-02-19/
   large-file-3.txt

Or is there another way to accomplish this?

Thanks!
Austin

On Tue, Feb 18, 2020 at 5:33 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hey all,

Has anyone had success using the StreamingFileSink[1] to write CSV files? And if so, what about compressed (Gzipped, ideally) files/ which libraries did you use?


Best,
Austin


Reply | Threaded
Open this post in threaded view
|

Re: CSV StreamingFileSink

Timo Walther
Hi Austin,

the StreamingFileSink allows bucketing the output data.

This should help for your use case:

https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#bucket-assignment

Regards,
Timo


On 19.02.20 01:00, Austin Cawley-Edwards wrote:

> Following up on this -- does anyone know if it's possible to stream
> individual files to a directory using the StreamingFileSink? For
> instance, if I want all records that come in during a certain day to be
> partitioned into daily directories:
>
> 2020-02-18/
>     large-file-1.txt
>     large-file-2.txt
> 2020-02-19/
>     large-file-3.txt
>
> Or is there another way to accomplish this?
>
> Thanks!
> Austin
>
> On Tue, Feb 18, 2020 at 5:33 PM Austin Cawley-Edwards
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hey all,
>
>     Has anyone had success using the StreamingFileSink[1] to write CSV
>     files? And if so, what about compressed (Gzipped, ideally) files/
>     which libraries did you use?
>
>
>     Best,
>     Austin
>
>
>     [1]:
>     https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html
>

Reply | Threaded
Open this post in threaded view
|

Re: CSV StreamingFileSink

austin.ce
Hey Timo,

Thanks for the assignment link! Looks like most of my issues can be solved by getting better acquainted with Java file APIs and not in Flink-land.


Best,
Austin

On Wed, Feb 19, 2020 at 6:48 AM Timo Walther <[hidden email]> wrote:
Hi Austin,

the StreamingFileSink allows bucketing the output data.

This should help for your use case:

https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#bucket-assignment

Regards,
Timo


On 19.02.20 01:00, Austin Cawley-Edwards wrote:
> Following up on this -- does anyone know if it's possible to stream
> individual files to a directory using the StreamingFileSink? For
> instance, if I want all records that come in during a certain day to be
> partitioned into daily directories:
>
> 2020-02-18/
>     large-file-1.txt
>     large-file-2.txt
> 2020-02-19/
>     large-file-3.txt
>
> Or is there another way to accomplish this?
>
> Thanks!
> Austin
>
> On Tue, Feb 18, 2020 at 5:33 PM Austin Cawley-Edwards
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hey all,
>
>     Has anyone had success using the StreamingFileSink[1] to write CSV
>     files? And if so, what about compressed (Gzipped, ideally) files/
>     which libraries did you use?
>
>
>     Best,
>     Austin
>
>
>     [1]:
>     https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html
>