Writing to S3

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing to S3

Steve Bistline
I am trying to write out to S3 from Flink with the following code and getting the error below. Tried adding the parser as a dependency, etc.

Any help would be appreciated

Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " + TEMPERATURE_THRESHOLD);
tableEnv.toAppendStream(result, Row.class).print();
// Write to S3 bucket
DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
dsRow.writeAsText("s3://csv-lifeai-ai/flink-alerts");

===========================

avax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
	at java.lang.Thread.run(Thread.java:748)
Reply | Threaded
Open this post in threaded view
|

Re: Writing to S3

Ken Krugler
Hi Steve,


I see that you have classes like org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration in your jar.

Which makes me think you’re not excluding those properly.

— Ken


On Nov 15, 2018, at 3:58 PM, Steve Bistline <[hidden email]> wrote:

I am trying to write out to S3 from Flink with the following code and getting the error below. Tried adding the parser as a dependency, etc.

Any help would be appreciated

Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " + TEMPERATURE_THRESHOLD);
tableEnv.toAppendStream(result, Row.class).print();
// Write to S3 bucket
DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
dsRow.writeAsText("<a href="s3://csv-lifeai-ai/flink-alerts" class="">s3://csv-lifeai-ai/flink-alerts");

===========================

avax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
	at java.lang.Thread.run(Thread.java:748)

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply | Threaded
Open this post in threaded view
|

Re: Writing to S3

Steve Bistline
Hi Ken,

Thank you for the link... I had just found this and when I removed the Hadoop dependencies ( not using in this project anyway ) things worked fine. 

Now just trying to figure out the credentials.

Thanks,

Steve

On Thu, Nov 15, 2018 at 7:12 PM Ken Krugler <[hidden email]> wrote:
Hi Steve,


I see that you have classes like org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration in your jar.

Which makes me think you’re not excluding those properly.

— Ken


On Nov 15, 2018, at 3:58 PM, Steve Bistline <[hidden email]> wrote:

I am trying to write out to S3 from Flink with the following code and getting the error below. Tried adding the parser as a dependency, etc.

Any help would be appreciated

Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " + TEMPERATURE_THRESHOLD);
tableEnv.toAppendStream(result, Row.class).print();
// Write to S3 bucket
DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
dsRow.writeAsText("s3://csv-lifeai-ai/flink-alerts");

===========================

avax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
	at java.lang.Thread.run(Thread.java:748)

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra