error accessing S3 bucket 1.12

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

error accessing S3 bucket 1.12

Billy Bain
I'm trying to use readTextFile() to access files in S3. I have verified the s3 key and secret are clean and the s3 path is similar to com.somepath/somefile. (the names changed to protect the guilty)

Any idea what I'm missing? 

2021-01-13 12:12:43,836 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Opened ContinuousFileMonitoringFunction (taskIdx= 0) for path: s3://com.somepath/somefile
2021-01-13 12:12:43,843 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Creating S3 file system backed by Hadoop s3a file system
2021-01-13 12:12:43,844 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Loading Hadoop configuration for Hadoop s3a file system
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.access-key as fs.s3a.access-key to Hadoop config
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.secret-key as fs.s3a.secret-key to Hadoop config
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask [] - Invoking Split Reader: Custom File Source -> (Timestamps/Watermarks, Map -> Filter -> Sink: Unnamed) (1/1)#0
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for TimestampsAndWatermarksOperator_1cf40e099136da16c66c61032de62905_(1/1) with empty state.
2021-01-13 12:12:43,946 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for StreamSink_d91236bbbed306c2379eac4982246f1f_(1/1) with empty state.
2021-01-13 12:12:43,955 DEBUG org.apache.hadoop.conf.Configuration [] - Reloading 1 existing configurations
2021-01-13 12:12:43,961 DEBUG org.apache.flink.fs.s3hadoop.S3FileSystemFactory [] - Using scheme s3://com.somepath/somefile for s3a file system backing the S3 File System
2021-01-13 12:12:43,965 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Closed File Monitoring Source for path: s3://com.somepath/somefile.
2021-01-13 12:12:43,967 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: Custom File Source (1/1)#0 (1d75ae07abbd65f296c55a61a400c59f) switched from RUNNING to FAILED.
java.io.IOException: null uri host. This can be caused by unencoded / in the password string
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:163) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:61) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:468) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:196) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:215) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string
    at java.util.Objects.requireNonNull(Objects.java:246) ~[?:?]
    at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:69) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:467) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:234) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:126) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    ... 7 more



--
Wayne D. Young
aka Billy Bob Bain
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: error accessing S3 bucket 1.12

Dawid Wysakowicz-2

Hi Billy,

I think you might be hitting the same problem as described in this thread[1]. Does your bucket meet all the name requirements as described in here[2] (e.g. have an underscore)?

Best,

Dawid

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unable-to-set-S3-like-object-storage-for-state-backend-td28362.html

[2] https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

On 13/01/2021 19:20, Billy Bain wrote:
I'm trying to use readTextFile() to access files in S3. I have verified the s3 key and secret are clean and the s3 path is similar to com.somepath/somefile. (the names changed to protect the guilty)

Any idea what I'm missing? 

2021-01-13 12:12:43,836 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Opened ContinuousFileMonitoringFunction (taskIdx= 0) for path: s3://com.somepath/somefile
2021-01-13 12:12:43,843 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Creating S3 file system backed by Hadoop s3a file system
2021-01-13 12:12:43,844 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Loading Hadoop configuration for Hadoop s3a file system
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.access-key as fs.s3a.access-key to Hadoop config
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.secret-key as fs.s3a.secret-key to Hadoop config
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask [] - Invoking Split Reader: Custom File Source -> (Timestamps/Watermarks, Map -> Filter -> Sink: Unnamed) (1/1)#0
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for TimestampsAndWatermarksOperator_1cf40e099136da16c66c61032de62905_(1/1) with empty state.
2021-01-13 12:12:43,946 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for StreamSink_d91236bbbed306c2379eac4982246f1f_(1/1) with empty state.
2021-01-13 12:12:43,955 DEBUG org.apache.hadoop.conf.Configuration [] - Reloading 1 existing configurations
2021-01-13 12:12:43,961 DEBUG org.apache.flink.fs.s3hadoop.S3FileSystemFactory [] - Using scheme s3://com.somepath/somefile for s3a file system backing the S3 File System
2021-01-13 12:12:43,965 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Closed File Monitoring Source for path: s3://com.somepath/somefile.
2021-01-13 12:12:43,967 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: Custom File Source (1/1)#0 (1d75ae07abbd65f296c55a61a400c59f) switched from RUNNING to FAILED.
java.io.IOException: null uri host. This can be caused by unencoded / in the password string
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:163) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:61) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:468) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:196) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:215) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string
    at java.util.Objects.requireNonNull(Objects.java:246) ~[?:?]
    at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:69) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:467) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:234) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:126) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    ... 7 more



--
Wayne D. Young
aka Billy Bob Bain
[hidden email]

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: error accessing S3 bucket 1.12

Billy Bain
Dawid, 
We found the issue. Our bucket has periods in the name, 
  com.this.bucket.fails 
Recreating the bucket with dashes instead of periods solved it.  
  com-this-bucket-succeeds

This seems crazy, but the bucket naming guidelines are clear. 


For best compatibility, we recommend that you avoid using dots (.) in bucket names, except for buckets that are used only for static website hosting. If you include dots in a bucket's name, you can't use virtual-host-style addressing over HTTPS, unless you perform your own certificate validation. This is because the security certificates used for virtual hosting of buckets don't work for buckets with dots in their names.


On Thu, Jan 14, 2021 at 11:12 AM Dawid Wysakowicz <[hidden email]> wrote:

Hi Billy,

I think you might be hitting the same problem as described in this thread[1]. Does your bucket meet all the name requirements as described in here[2] (e.g. have an underscore)?

Best,

Dawid

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unable-to-set-S3-like-object-storage-for-state-backend-td28362.html

[2] https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html

On 13/01/2021 19:20, Billy Bain wrote:
I'm trying to use readTextFile() to access files in S3. I have verified the s3 key and secret are clean and the s3 path is similar to com.somepath/somefile. (the names changed to protect the guilty)

Any idea what I'm missing? 

2021-01-13 12:12:43,836 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Opened ContinuousFileMonitoringFunction (taskIdx= 0) for path: s3://com.somepath/somefile
2021-01-13 12:12:43,843 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Creating S3 file system backed by Hadoop s3a file system
2021-01-13 12:12:43,844 DEBUG org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory [] - Loading Hadoop configuration for Hadoop s3a file system
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.access-key as fs.s3a.access-key to Hadoop config
2021-01-13 12:12:43,926 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader [] - Adding Flink config entry for s3.secret-key as fs.s3a.secret-key to Hadoop config
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask [] - Invoking Split Reader: Custom File Source -> (Timestamps/Watermarks, Map -> Filter -> Sink: Unnamed) (1/1)#0
2021-01-13 12:12:43,944 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for TimestampsAndWatermarksOperator_1cf40e099136da16c66c61032de62905_(1/1) with empty state.
2021-01-13 12:12:43,946 DEBUG org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - Creating operator state backend for StreamSink_d91236bbbed306c2379eac4982246f1f_(1/1) with empty state.
2021-01-13 12:12:43,955 DEBUG org.apache.hadoop.conf.Configuration [] - Reloading 1 existing configurations
2021-01-13 12:12:43,961 DEBUG org.apache.flink.fs.s3hadoop.S3FileSystemFactory [] - Using scheme s3://com.somepath/somefile for s3a file system backing the S3 File System
2021-01-13 12:12:43,965 DEBUG org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction [] - Closed File Monitoring Source for path: s3://com.somepath/somefile.
2021-01-13 12:12:43,967 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: Custom File Source (1/1)#0 (1d75ae07abbd65f296c55a61a400c59f) switched from RUNNING to FAILED.
java.io.IOException: null uri host. This can be caused by unencoded / in the password string
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:163) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:61) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:468) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:196) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:215) ~[flink-dist_2.11-1.12.0.jar:1.12.0]
Caused by: java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string
    at java.util.Objects.requireNonNull(Objects.java:246) ~[?:?]
    at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:69) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:467) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:234) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:126) ~[blob_p-e297dae3da73ba51c20f14193b5ae6e09694422a-293a7d95166eee9a9b2329b71764cf67:?]
    ... 7 more



--
Wayne D. Young
aka Billy Bob Bain
[hidden email]


--
Wayne D. Young
aka Billy Bob Bain
[hidden email]