S3 parquet files as Sink in the Table SQL API

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

S3 parquet files as Sink in the Table SQL API

meneldor
Hello,
I am using PyFlink and I want to write records from the table sql api as parquet files on AWS S3. I followed the documentations but it seems that I'm missing some  dependencies or/and configuration. Here is the SQL:
CREATE TABLE sink_table(
`id` VARCHAR,
`type` VARCHAR,
`machn` VARCHAR,
`lastacct_id` BIGINT,
`upd_ts` BIGINT
) WITH (
'connector' = 'filesystem',
'path' = 's3a://my-bucket/flink_sink',
'format' = 'parquet'
)
This is the configuration in flink-conf.yaml:

s3.endpoint: https://s3.us-west-1.amazonaws.com
s3.path.style.access: true
s3.access-key: ***KEY-STRING***
s3.secret-key: ***KEY-SECRET-STRING***
s3.entropy.key: _entropy_
s3.entropy.length: 8
hadoop.s3.socket-timeout: 10m
I downloaded flink-s3-fs-hadoop-1.12.1.jar and flink-hadoop-compatibility_2.11-1.12.1.jar in plugins/ and flink-sql-parquet_2.11-1.12.1.jar in lib/

Here is the exception:
Traceback (most recent call last):
  File "s3_sink.py", line 101, in <module>
    """)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/table/table_environment.py", line 766, in execute_sql
    return TableResult(self._j_tenv.executeSql(stmt))
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1286, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/util/exceptions.py", line 147, in deco
    return f(*a, **kw)
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o14.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.getParquetConfiguration(ParquetFileFormatFactory.java:115)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.access$000(ParquetFileFormatFactory.java:51)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:103)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:97)
at org.apache.flink.table.filesystem.FileSystemTableSink.createWriter(FileSystemTableSink.java:373)
at org.apache.flink.table.filesystem.FileSystemTableSink.createStreamingSink(FileSystemTableSink.java:183)
at org.apache.flink.table.filesystem.FileSystemTableSink.consume(FileSystemTableSink.java:145)
at org.apache.flink.table.filesystem.FileSystemTableSink.lambda$getSinkRuntimeProvider$0(FileSystemTableSink.java:134)
at org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalSink.createSinkTransformation(CommonPhysicalSink.scala:95)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:103)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:167)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1329)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:676)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 40 more
 
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: S3 parquet files as Sink in the Table SQL API

Matthias
Hi,
have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing.

Best,
Matthias


On Wed, Feb 10, 2021 at 1:24 PM meneldor <[hidden email]> wrote:
Hello,
I am using PyFlink and I want to write records from the table sql api as parquet files on AWS S3. I followed the documentations but it seems that I'm missing some  dependencies or/and configuration. Here is the SQL:
CREATE TABLE sink_table(
`id` VARCHAR,
`type` VARCHAR,
`machn` VARCHAR,
`lastacct_id` BIGINT,
`upd_ts` BIGINT
) WITH (
'connector' = 'filesystem',
'path' = 's3a://my-bucket/flink_sink',
'format' = 'parquet'
)
This is the configuration in flink-conf.yaml:

s3.endpoint: https://s3.us-west-1.amazonaws.com
s3.path.style.access: true
s3.access-key: ***KEY-STRING***
s3.secret-key: ***KEY-SECRET-STRING***
s3.entropy.key: _entropy_
s3.entropy.length: 8
hadoop.s3.socket-timeout: 10m
I downloaded flink-s3-fs-hadoop-1.12.1.jar and flink-hadoop-compatibility_2.11-1.12.1.jar in plugins/ and flink-sql-parquet_2.11-1.12.1.jar in lib/

Here is the exception:
Traceback (most recent call last):
  File "s3_sink.py", line 101, in <module>
    """)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/table/table_environment.py", line 766, in execute_sql
    return TableResult(self._j_tenv.executeSql(stmt))
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1286, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/util/exceptions.py", line 147, in deco
    return f(*a, **kw)
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o14.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.getParquetConfiguration(ParquetFileFormatFactory.java:115)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.access$000(ParquetFileFormatFactory.java:51)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:103)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:97)
at org.apache.flink.table.filesystem.FileSystemTableSink.createWriter(FileSystemTableSink.java:373)
at org.apache.flink.table.filesystem.FileSystemTableSink.createStreamingSink(FileSystemTableSink.java:183)
at org.apache.flink.table.filesystem.FileSystemTableSink.consume(FileSystemTableSink.java:145)
at org.apache.flink.table.filesystem.FileSystemTableSink.lambda$getSinkRuntimeProvider$0(FileSystemTableSink.java:134)
at org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalSink.createSinkTransformation(CommonPhysicalSink.scala:95)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:103)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:167)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1329)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:676)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 40 more
 
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: S3 parquet files as Sink in the Table SQL API

meneldor
Well, i am not sure which of those actually helped but it works now. I downloaded the following jars in plugins/s3-fs-hadoop/ :
flink-hadoop-compatibility_2.11-1.12.1.jar  
flink-s3-fs-hadoop-1.12.1.jar  
flink-sql-parquet_2.11-1.12.1.jar  
force-shading-1.12.1.jar  
hadoop-mapreduce-client-core-2.7.2.jar
Also, i included them in t_env.get_config().get_configuration().set_string("pipeline.jars", "......" )
It works now and the only thing i cant figure out is how to change the file naming. All files are named like part-5f696f6d-4f66-46ea-b321-891c76cae89e-0-1 and dont have the '.parquet' suffix.

Thanks

On Wed, Feb 10, 2021 at 6:55 PM Matthias Pohl <[hidden email]> wrote:
Hi,
have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing.

Best,
Matthias


On Wed, Feb 10, 2021 at 1:24 PM meneldor <[hidden email]> wrote:
Hello,
I am using PyFlink and I want to write records from the table sql api as parquet files on AWS S3. I followed the documentations but it seems that I'm missing some  dependencies or/and configuration. Here is the SQL:
CREATE TABLE sink_table(
`id` VARCHAR,
`type` VARCHAR,
`machn` VARCHAR,
`lastacct_id` BIGINT,
`upd_ts` BIGINT
) WITH (
'connector' = 'filesystem',
'path' = 's3a://my-bucket/flink_sink',
'format' = 'parquet'
)
This is the configuration in flink-conf.yaml:

s3.endpoint: https://s3.us-west-1.amazonaws.com
s3.path.style.access: true
s3.access-key: ***KEY-STRING***
s3.secret-key: ***KEY-SECRET-STRING***
s3.entropy.key: _entropy_
s3.entropy.length: 8
hadoop.s3.socket-timeout: 10m
I downloaded flink-s3-fs-hadoop-1.12.1.jar and flink-hadoop-compatibility_2.11-1.12.1.jar in plugins/ and flink-sql-parquet_2.11-1.12.1.jar in lib/

Here is the exception:
Traceback (most recent call last):
  File "s3_sink.py", line 101, in <module>
    """)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/table/table_environment.py", line 766, in execute_sql
    return TableResult(self._j_tenv.executeSql(stmt))
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1286, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/util/exceptions.py", line 147, in deco
    return f(*a, **kw)
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o14.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.getParquetConfiguration(ParquetFileFormatFactory.java:115)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.access$000(ParquetFileFormatFactory.java:51)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:103)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:97)
at org.apache.flink.table.filesystem.FileSystemTableSink.createWriter(FileSystemTableSink.java:373)
at org.apache.flink.table.filesystem.FileSystemTableSink.createStreamingSink(FileSystemTableSink.java:183)
at org.apache.flink.table.filesystem.FileSystemTableSink.consume(FileSystemTableSink.java:145)
at org.apache.flink.table.filesystem.FileSystemTableSink.lambda$getSinkRuntimeProvider$0(FileSystemTableSink.java:134)
at org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalSink.createSinkTransformation(CommonPhysicalSink.scala:95)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:103)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:167)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1329)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:676)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 40 more
 
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: S3 parquet files as Sink in the Table SQL API

Arvid Heise-4
Hi,

If you just want to use s3a, you only need flink-s3-fs-hadoop-1.12.1.jar  in the plugin.

The format flink-sql-parquet_2.11-1.12.1.jar  should be in lib.
All other jars are not needed afaik.

On Thu, Feb 11, 2021 at 9:00 AM meneldor <[hidden email]> wrote:
Well, i am not sure which of those actually helped but it works now. I downloaded the following jars in plugins/s3-fs-hadoop/ :
flink-hadoop-compatibility_2.11-1.12.1.jar  
flink-s3-fs-hadoop-1.12.1.jar  
flink-sql-parquet_2.11-1.12.1.jar  
force-shading-1.12.1.jar  
hadoop-mapreduce-client-core-2.7.2.jar
Also, i included them in t_env.get_config().get_configuration().set_string("pipeline.jars", "......" )
It works now and the only thing i cant figure out is how to change the file naming. All files are named like part-5f696f6d-4f66-46ea-b321-891c76cae89e-0-1 and dont have the '.parquet' suffix.

Thanks

On Wed, Feb 10, 2021 at 6:55 PM Matthias Pohl <[hidden email]> wrote:
Hi,
have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing.

Best,
Matthias


On Wed, Feb 10, 2021 at 1:24 PM meneldor <[hidden email]> wrote:
Hello,
I am using PyFlink and I want to write records from the table sql api as parquet files on AWS S3. I followed the documentations but it seems that I'm missing some  dependencies or/and configuration. Here is the SQL:
CREATE TABLE sink_table(
`id` VARCHAR,
`type` VARCHAR,
`machn` VARCHAR,
`lastacct_id` BIGINT,
`upd_ts` BIGINT
) WITH (
'connector' = 'filesystem',
'path' = 's3a://my-bucket/flink_sink',
'format' = 'parquet'
)
This is the configuration in flink-conf.yaml:

s3.endpoint: https://s3.us-west-1.amazonaws.com
s3.path.style.access: true
s3.access-key: ***KEY-STRING***
s3.secret-key: ***KEY-SECRET-STRING***
s3.entropy.key: _entropy_
s3.entropy.length: 8
hadoop.s3.socket-timeout: 10m
I downloaded flink-s3-fs-hadoop-1.12.1.jar and flink-hadoop-compatibility_2.11-1.12.1.jar in plugins/ and flink-sql-parquet_2.11-1.12.1.jar in lib/

Here is the exception:
Traceback (most recent call last):
  File "s3_sink.py", line 101, in <module>
    """)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/table/table_environment.py", line 766, in execute_sql
    return TableResult(self._j_tenv.executeSql(stmt))
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1286, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/user/miniconda3/lib/python3.7/site-packages/pyflink/util/exceptions.py", line 147, in deco
    return f(*a, **kw)
  File "/home/user/miniconda3/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o14.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.getParquetConfiguration(ParquetFileFormatFactory.java:115)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory.access$000(ParquetFileFormatFactory.java:51)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:103)
at org.apache.flink.formats.parquet.ParquetFileFormatFactory$2.createRuntimeEncoder(ParquetFileFormatFactory.java:97)
at org.apache.flink.table.filesystem.FileSystemTableSink.createWriter(FileSystemTableSink.java:373)
at org.apache.flink.table.filesystem.FileSystemTableSink.createStreamingSink(FileSystemTableSink.java:183)
at org.apache.flink.table.filesystem.FileSystemTableSink.consume(FileSystemTableSink.java:145)
at org.apache.flink.table.filesystem.FileSystemTableSink.lambda$getSinkRuntimeProvider$0(FileSystemTableSink.java:134)
at org.apache.flink.table.planner.plan.nodes.common.CommonPhysicalSink.createSinkTransformation(CommonPhysicalSink.scala:95)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:103)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:43)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:43)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)
at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:167)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1329)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:676)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 40 more
 
Thank you!