Hi to all,
is there a way to write out Parquet-Avro data using BatchTableEnvironment with Flink 1.11? At the moment I'm using the hadoop ParquetOutputFormat but I hope to be able to get rid of it sooner or later..I saw that there's the AvroOutputFormat but no support for it using Parquet. Best, Flavio |
Hi Flavio, AvroOutputFormat only supports writing Avro files. I think you can use `AvroParquetOutputFormat` as a hadoop output format, and wrap it through Flink `HadoopOutputFormat`. Best, Jingsong On Fri, Jul 17, 2020 at 11:59 PM Flavio Pompermaier <[hidden email]> wrote:
Best, Jingsong Lee |
This is what I actually do but I was hoping to be able to get rid of the HadoopOutputForma and be able to use a more comfortable Source/Sink implementation. On Tue, Jul 21, 2020 at 12:38 PM Jingsong Li <[hidden email]> wrote:
|
In table/SQL, I think we don't need a source/sink for `AvroParquetOutputFormat`, because the data structure is always Row or RowData, should not be a avro object. Best, Jingsong On Tue, Jul 21, 2020 at 8:09 PM Flavio Pompermaier <[hidden email]> wrote:
Best, Jingsong Lee |
I think that's not true when you need to integrate Flink into an existing data-lake..I think it should be very straightforward (in my opinion) to read/ write Parquet data with objects serialized with avro/thrift/protobuf...or at least reuse hadoop input/output formats with table API. At the moment I have to pass through a lot of custom code that uses the Hadoop formats and is a lto of code just to read and write thrift or avro serialized objects in parquet folders. On Wed, Jul 22, 2020 at 3:35 AM Jingsong Li <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |