Flink to S3 streaming

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink to S3 streaming

pradeep s
Hi,
I have a use case to stream messages from Kafka to Amazon S3. I am not using the s3 file system way since i need to have Object tags to be added for each object written in S3.
So i am planning to use the AWS S3 sdk . But i have a query on how to hold the data till the message size is in few MBs and then write to S3.Also what should be sink to be used in this case if i am using S3 sdks to write to S3.
Regards
Pradeep S
Reply | Threaded
Open this post in threaded view
|

Re: Flink to S3 streaming

Aljoscha Krettek
Hi,
You would have to write your own SinkFunction that uses the AWS S3 sdk to write to S3. You might be interested in the work proposed in this Jira: https://issues.apache.org/jira/browse/FLINK-6306

As to buffering elements, I’m afraid you would also have to roll your own solution for now. You could use the Flink state API for that: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html This even has an example of a buffering Sink.

Best,
Aljoscha
On 8. Apr 2017, at 08:09, pradeep s <[hidden email]> wrote:

Hi,
I have a use case to stream messages from Kafka to Amazon S3. I am not using the s3 file system way since i need to have Object tags to be added for each object written in S3.
So i am planning to use the AWS S3 sdk . But i have a query on how to hold the data till the message size is in few MBs and then write to S3.Also what should be sink to be used in this case if i am using S3 sdks to write to S3.
Regards
Pradeep S