StreamingFileSink output formatting to CSV

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

StreamingFileSink output formatting to CSV

Robert Cullen

I have a StreamingFileSink that writes to S3:

        final StreamingFileSink<Tuple2<String, Long>> sink =
                StreamingFileSink.forRowFormat(
                        new Path("s3://argo-artifacts/files"),
                        new SimpleStringEncoder<Tuple2<String, Long>>("UTF-8"))
                        .withBucketAssigner(new KeyBucketAssigner())
                        .withRollingPolicy(OnCheckpointRollingPolicy.build())
                        .withOutputFileConfig(config)
                        .build();

I’d like to get the output into CSV format so that mc or aws client can use sql to query the data. The current output adds open/closed parentheses around the row. Is there a way to format this without the parens?

(00136627-8e1e-4c84-9d8d-b6cfe9d092aa,1)
(00136627-8e1e-4c84-9d8d-b6cfe9d092aa,2)
--
Robert Cullen
240-475-4490
Reply | Threaded
Open this post in threaded view
|

Re: StreamingFileSink output formatting to CSV

Chesnay Schepler

This is handled by the StringEncoder; the one you use (SimpleStringEncoder) just calls toString on the input element.

I don't think Flink provides a CSV StringEncoder, but if all you want is remove the parenthesis, then you could wrap the SimpleStringEncoder and trim the first and last character.


On 6/3/2021 3:45 PM, Robert Cullen wrote:

I have a StreamingFileSink that writes to S3:

        final StreamingFileSink<Tuple2<String, Long>> sink =
                StreamingFileSink.forRowFormat(
                        new Path("s3://argo-artifacts/files"),
                        new SimpleStringEncoder<Tuple2<String, Long>>("UTF-8"))
                        .withBucketAssigner(new KeyBucketAssigner())
                        .withRollingPolicy(OnCheckpointRollingPolicy.build())
                        .withOutputFileConfig(config)
                        .build();

I’d like to get the output into CSV format so that mc or aws client can use sql to query the data. The current output adds open/closed parentheses around the row. Is there a way to format this without the parens?

(00136627-8e1e-4c84-9d8d-b6cfe9d092aa,1)
(00136627-8e1e-4c84-9d8d-b6cfe9d092aa,2)
--
Robert Cullen
240-475-4490