Additional options to S3 Filesystem: Interest?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Additional options to S3 Filesystem: Interest?

Padarn Wilson-2
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn
Reply | Threaded
Open this post in threaded view
|

Re: Additional options to S3 Filesystem: Interest?

Arvid Heise-3
Hi Padarn,

sounds like a good addition to me. We could wait for more feedback or you could start immedately.

The next step would be to create a JIRA and get it assigned to you.

Looking forward to your contribution

Arvid

On Sun, Oct 11, 2020 at 7:45 AM Padarn Wilson <[hidden email]> wrote:
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Additional options to S3 Filesystem: Interest?

Dan Diephouse
In reply to this post by Padarn Wilson-2
We use the StreamingFileSink. An option to expire files after some time period would certainly be welcome. (I could probably figure out a way to do this from the S3 admin UI too though)

On Sat, Oct 10, 2020 at 10:45 PM Padarn Wilson <[hidden email]> wrote:
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn


--
Dan Diephouse
@dandiep
Reply | Threaded
Open this post in threaded view
|

Re: Additional options to S3 Filesystem: Interest?

Padarn Wilson-2
Thanks for the feedback. I've created a JIRA here https://issues.apache.org/jira/browse/FLINK-19589

@Dan: This indeed would make it easier to set a lifetime property on objects created by Flink, but actually if you want to apply it to all your objects for a given bucket you can set bucket wide policies instead. The reason I want this is that we have a shared bucket and wish to tag different objects based on which pipeline is producing them.

On Tue, Oct 13, 2020 at 4:13 AM Dan Diephouse <[hidden email]> wrote:
We use the StreamingFileSink. An option to expire files after some time period would certainly be welcome. (I could probably figure out a way to do this from the S3 admin UI too though)

On Sat, Oct 10, 2020 at 10:45 PM Padarn Wilson <[hidden email]> wrote:
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn


--
Dan Diephouse
@dandiep
Reply | Threaded
Open this post in threaded view
|

Re: Additional options to S3 Filesystem: Interest?

Arvid Heise-3
Hi Padarn,

I assigned the ticket to you, so you can start working on it. Here are some contribution guidelines [1] in case it's your first contribution.

Basically, you will need to open a PR which contains the ticket and component. So the prefix should be "[FLINK-19589][s3]" (also for your commits).

Feel free to reach out to me if you have any questions about the process. All discussions about the feature should be on the ticket, so everyone can see it.


On Tue, Oct 13, 2020 at 3:37 AM Padarn Wilson <[hidden email]> wrote:
Thanks for the feedback. I've created a JIRA here https://issues.apache.org/jira/browse/FLINK-19589

@Dan: This indeed would make it easier to set a lifetime property on objects created by Flink, but actually if you want to apply it to all your objects for a given bucket you can set bucket wide policies instead. The reason I want this is that we have a shared bucket and wish to tag different objects based on which pipeline is producing them.

On Tue, Oct 13, 2020 at 4:13 AM Dan Diephouse <[hidden email]> wrote:
We use the StreamingFileSink. An option to expire files after some time period would certainly be welcome. (I could probably figure out a way to do this from the S3 admin UI too though)

On Sat, Oct 10, 2020 at 10:45 PM Padarn Wilson <[hidden email]> wrote:
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn


--
Dan Diephouse
@dandiep


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Additional options to S3 Filesystem: Interest?

Padarn Wilson-2
Great. Thanks.

On Tue, Oct 13, 2020 at 4:29 PM Arvid Heise <[hidden email]> wrote:
Hi Padarn,

I assigned the ticket to you, so you can start working on it. Here are some contribution guidelines [1] in case it's your first contribution.

Basically, you will need to open a PR which contains the ticket and component. So the prefix should be "[FLINK-19589][s3]" (also for your commits).

Feel free to reach out to me if you have any questions about the process. All discussions about the feature should be on the ticket, so everyone can see it.


On Tue, Oct 13, 2020 at 3:37 AM Padarn Wilson <[hidden email]> wrote:
Thanks for the feedback. I've created a JIRA here https://issues.apache.org/jira/browse/FLINK-19589

@Dan: This indeed would make it easier to set a lifetime property on objects created by Flink, but actually if you want to apply it to all your objects for a given bucket you can set bucket wide policies instead. The reason I want this is that we have a shared bucket and wish to tag different objects based on which pipeline is producing them.

On Tue, Oct 13, 2020 at 4:13 AM Dan Diephouse <[hidden email]> wrote:
We use the StreamingFileSink. An option to expire files after some time period would certainly be welcome. (I could probably figure out a way to do this from the S3 admin UI too though)

On Sat, Oct 10, 2020 at 10:45 PM Padarn Wilson <[hidden email]> wrote:
Hi Flink Users,

We need to expose some additional options for the s3 hadoop filesystem: Specifically, we want to set object tagging and lifecycle. This would be a fairly easy change and we initially thought to create a new Filsystem with very minor changes to allow this.

However then I wondered, would others use this? If it something that is worth raising as a Flink issue and then contributing back upstream.

Any others who would like to be able to set object tags for the s3 filesystem?

Cheers,
Padarn


--
Dan Diephouse
@dandiep


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng