presto s3p checkpoints and local stack

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

presto s3p checkpoints and local stack

Marco Villalobos-2
Hi,

I got s3a working on localstack. The missing piece of information from Flink documentation seems to be that the system requires a HADOOP_HOME and core-site.xml.

Flink documentation states that s3p (presto) should be used for file checkpointing into s3. I am using RocksDB, which I assume also means that I should use s3p (the documentation was not specific about that).  Is that assumption correct?

However, I cannot get s3p working now.

I did the following so far:

I created the s3-fs-presto plugin directory and copied the jar from the opt directory there.
I am not sure where to put the configuration keys though.  The documentation states that I can just put in my flink-conf.yaml, but I had no success.

Where do I put the presto configuration keys? Are there any other missing steps? Is this something that would only work on an EMR environment with a real HIVE running?

# The S3 storage endpoint server. This can be used to connect to an S3-compatible storage
# system instead of AWS. When using v4 signatures, it is recommended to set this to the
# AWS region-specific endpoint (e.g., http[s]://<bucket>.s3-<AWS-region>.amazonaws.com).
hive.s3.endpoint: http://aws:4566

# Use HTTPS to communicate with the S3 API (defaults to true).
hive.s3.ssl.enabled: false

# Use path-style access for all requests to the S3-compatible storage.
# This is for S3-compatible storage that doesn’t support virtual-hosted-style access. (defaults to false)
hive.s3.path-style-access: true

But that also did not work.

Any advice would be appreciated.

-Marco Villalobos
Reply | Threaded
Open this post in threaded view
|

Re: presto s3p checkpoints and local stack

Arvid Heise-4
Hi Marco,

afaik you don't need HADOOP_HOME or core-site.xml.

I'm also not sure from where you got your config keys. (I guess from the Presto page, which probably all work if you remove hive., maybe we should also support that)

All keys with prefix s3 or s3p (and fs.s3, fs.s3p) are routed towards presto [1].

So it should be
s3.access-key: XXX
s3.secret-key: XXX
s3.endpoint: http://aws:4566
s3.path-style-access: true
s3.path.style.access: true (only one of them is needed, but I don't know which, so please try out)

[1] https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/s3.html#configure-access-credentials

On Thu, Jan 28, 2021 at 4:58 PM Marco Villalobos <[hidden email]> wrote:
Hi,

I got s3a working on localstack. The missing piece of information from Flink documentation seems to be that the system requires a HADOOP_HOME and core-site.xml.

Flink documentation states that s3p (presto) should be used for file checkpointing into s3. I am using RocksDB, which I assume also means that I should use s3p (the documentation was not specific about that).  Is that assumption correct?

However, I cannot get s3p working now.

I did the following so far:

I created the s3-fs-presto plugin directory and copied the jar from the opt directory there.
I am not sure where to put the configuration keys though.  The documentation states that I can just put in my flink-conf.yaml, but I had no success.

Where do I put the presto configuration keys? Are there any other missing steps? Is this something that would only work on an EMR environment with a real HIVE running?

# The S3 storage endpoint server. This can be used to connect to an S3-compatible storage
# system instead of AWS. When using v4 signatures, it is recommended to set this to the
# AWS region-specific endpoint (e.g., http[s]://<bucket>.s3-<AWS-region>.amazonaws.com).
hive.s3.endpoint: http://aws:4566

# Use HTTPS to communicate with the S3 API (defaults to true).
hive.s3.ssl.enabled: false

# Use path-style access for all requests to the S3-compatible storage.
# This is for S3-compatible storage that doesn’t support virtual-hosted-style access. (defaults to false)
hive.s3.path-style-access: true

But that also did not work.

Any advice would be appreciated.

-Marco Villalobos
Reply | Threaded
Open this post in threaded view
|

Re: presto s3p checkpoints and local stack

Marco Villalobos-2
Is it possible to use an environmental credentials provider?

On Thu, Jan 28, 2021 at 8:35 AM Arvid Heise <[hidden email]> wrote:
Hi Marco,

afaik you don't need HADOOP_HOME or core-site.xml.

I'm also not sure from where you got your config keys. (I guess from the Presto page, which probably all work if you remove hive., maybe we should also support that)

All keys with prefix s3 or s3p (and fs.s3, fs.s3p) are routed towards presto [1].

So it should be
s3.access-key: XXX
s3.secret-key: XXX
s3.endpoint: http://aws:4566
s3.path-style-access: true
s3.path.style.access: true (only one of them is needed, but I don't know which, so please try out)

[1] https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/s3.html#configure-access-credentials

On Thu, Jan 28, 2021 at 4:58 PM Marco Villalobos <[hidden email]> wrote:
Hi,

I got s3a working on localstack. The missing piece of information from Flink documentation seems to be that the system requires a HADOOP_HOME and core-site.xml.

Flink documentation states that s3p (presto) should be used for file checkpointing into s3. I am using RocksDB, which I assume also means that I should use s3p (the documentation was not specific about that).  Is that assumption correct?

However, I cannot get s3p working now.

I did the following so far:

I created the s3-fs-presto plugin directory and copied the jar from the opt directory there.
I am not sure where to put the configuration keys though.  The documentation states that I can just put in my flink-conf.yaml, but I had no success.

Where do I put the presto configuration keys? Are there any other missing steps? Is this something that would only work on an EMR environment with a real HIVE running?

# The S3 storage endpoint server. This can be used to connect to an S3-compatible storage
# system instead of AWS. When using v4 signatures, it is recommended to set this to the
# AWS region-specific endpoint (e.g., http[s]://<bucket>.s3-<AWS-region>.amazonaws.com).
hive.s3.endpoint: http://aws:4566

# Use HTTPS to communicate with the S3 API (defaults to true).
hive.s3.ssl.enabled: false

# Use path-style access for all requests to the S3-compatible storage.
# This is for S3-compatible storage that doesn’t support virtual-hosted-style access. (defaults to false)
hive.s3.path-style-access: true

But that also did not work.

Any advice would be appreciated.

-Marco Villalobos
Reply | Threaded
Open this post in threaded view
|

Re: presto s3p checkpoints and local stack

Arvid Heise-4
Hi Marco,

ideally you solve everything with IAM roles, but you can also use credentials providers such as EnvironmentVariableCredentialsProvider[1].

The key should be
s3.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider

Remember to put the respective jar into the folder of your s3p plugin, the folder structure should look like described here[2].

Note that this is tested for s3a, so it could be that it works differently in s3p. I see that presto usually uses presto.s3.credentials-provider.


On Thu, Jan 28, 2021 at 7:00 PM Marco Villalobos <[hidden email]> wrote:
Is it possible to use an environmental credentials provider?

On Thu, Jan 28, 2021 at 8:35 AM Arvid Heise <[hidden email]> wrote:
Hi Marco,

afaik you don't need HADOOP_HOME or core-site.xml.

I'm also not sure from where you got your config keys. (I guess from the Presto page, which probably all work if you remove hive., maybe we should also support that)

All keys with prefix s3 or s3p (and fs.s3, fs.s3p) are routed towards presto [1].

So it should be
s3.access-key: XXX
s3.secret-key: XXX
s3.endpoint: http://aws:4566
s3.path-style-access: true
s3.path.style.access: true (only one of them is needed, but I don't know which, so please try out)

[1] https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/s3.html#configure-access-credentials

On Thu, Jan 28, 2021 at 4:58 PM Marco Villalobos <[hidden email]> wrote:
Hi,

I got s3a working on localstack. The missing piece of information from Flink documentation seems to be that the system requires a HADOOP_HOME and core-site.xml.

Flink documentation states that s3p (presto) should be used for file checkpointing into s3. I am using RocksDB, which I assume also means that I should use s3p (the documentation was not specific about that).  Is that assumption correct?

However, I cannot get s3p working now.

I did the following so far:

I created the s3-fs-presto plugin directory and copied the jar from the opt directory there.
I am not sure where to put the configuration keys though.  The documentation states that I can just put in my flink-conf.yaml, but I had no success.

Where do I put the presto configuration keys? Are there any other missing steps? Is this something that would only work on an EMR environment with a real HIVE running?

# The S3 storage endpoint server. This can be used to connect to an S3-compatible storage
# system instead of AWS. When using v4 signatures, it is recommended to set this to the
# AWS region-specific endpoint (e.g., http[s]://<bucket>.s3-<AWS-region>.amazonaws.com).
hive.s3.endpoint: http://aws:4566

# Use HTTPS to communicate with the S3 API (defaults to true).
hive.s3.ssl.enabled: false

# Use path-style access for all requests to the S3-compatible storage.
# This is for S3-compatible storage that doesn’t support virtual-hosted-style access. (defaults to false)
hive.s3.path-style-access: true

But that also did not work.

Any advice would be appreciated.

-Marco Villalobos