(DEPRECATED) Apache Flink User Mailing List archive.

Flink work with raw S3 (S3FileSystem or other), not a HDFS backed by S3 (S3AFileSystem, NativeS3FileSystem)?

Classic

List

Threaded

2 messages Options

Steve Morin

Flink work with raw S3 (S3FileSystem or other), not a HDFS backed by S3 (S3AFileSystem, NativeS3FileSystem)?

Use-case: I am trying to see how to use flink with s3, where we use our own client libraries or things like AWS firehose to put data into S3, then process it in batch using flink. This clients are putting data into S3 with out HDFS - Aka we aren't using HDFS on top of S3.

Most of what I can find referenced [1] is using HDFS backed by S3 (S3AFileSystem, NativeS3FileSystem)

I find one reference [2] that using S3 Filesystem (S3FileSystem) doesn't work.

Can anyone with Flink experience help give any insight on this?

References:

Steve Morin | Managing Partner - CTO

Nvent

O <a href="tel:800-407-1156;803" target="_blank">800-407-1156 ext 803 | M <a href="tel:347-453-5579" style="color:rgb(17,85,204)" target="_blank">347-453-5579

[hidden email]

Enabling the Data Driven Enterprise

(Ask us how we can setup scalable open source realtime billion+ event/data collection/analytics infrastructure in weeks)

Service Areas: Management & Strategy Consulting | Data Engineering | Data Science & Visualization

Ufuk Celebi

Re: Flink work with raw S3 (S3FileSystem or other), not a HDFS backed by S3 (S3AFileSystem, NativeS3FileSystem)?

The Flink docs show how to setup Flink's internal file system operations to use the S3FileSystem (the StackOverflow question actually shows that it is working, see answer there).\

This is configuration is independent of what you are doing in your user code. If you want to use your own S3 based clients, you will have to package them with your user code JARs. Then you should be ready to go. That how you want to use them, right?

– Ufuk

On 11 November 2016 at 17:35:22, Steve Morin ([hidden email]) wrote:

> Use-case: I am trying to see how to use flink with s3, where we use our own
> client libraries or things like AWS firehose to put data into S3, then
> process it in batch using flink. This clients are putting data into S3
> with out HDFS - Aka we aren't using HDFS on top of S3.
>
> Most of what I can find referenced [1] is using HDFS backed by S3 (
> S3AFileSystem, NativeS3FileSystem)
>
> I find one reference [2] that using S3 Filesystem (S3FileSystem) doesn't wo
> rk.
>
> Can anyone with Flink experience help give any insight on this?
>
> References:
>
> - [1] -
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html
> - [2] -
> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
>
>
> --
> *Steve Morin | Managing Partner - CTO*
>
> *Nvent*
>
> O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579
>
> [hidden email]
>
> *Enabling the Data Driven Enterprise*
> *(Ask us how we can setup scalable open source realtime billion+ event/data
> collection/analytics infrastructure in weeks)*
>
> Service Areas: Management & Strategy Consulting | Data Engineering | Data
> Science & Visualization
>