How to write dataset as parquet format

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to write dataset as parquet format

ebru
Hello all,

We are trying to write dataset as parquet format, we use AvroParquetOutputFormat but it is not compatible with Flink’s FileOutputFormat.

Is there a way to write dataset as parquet?

-Ebru
Reply | Threaded
Open this post in threaded view
|

Re: How to write dataset as parquet format

Fabian Hueske-2
Hi Ebru,

AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to wrap AvroParquetOutputFormat in Flink's HadoopOutputFormat.

Hope this helps,
Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/hadoop_compatibility.html#using-hadoop-outputformats

2017-11-22 15:21 GMT+01:00 ebru <[hidden email]>:
Hello all,

We are trying to write dataset as parquet format, we use AvroParquetOutputFormat but it is not compatible with Flink’s FileOutputFormat.

Is there a way to write dataset as parquet?

-Ebru

Reply | Threaded
Open this post in threaded view
|

Re: How to write dataset as parquet format

Flavio Pompermaier

On 22 Nov 2017 18:29, "Fabian Hueske" <[hidden email]> wrote:
Hi Ebru,

AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to wrap AvroParquetOutputFormat in Flink's HadoopOutputFormat.

Hope this helps,
Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/hadoop_compatibility.html#using-hadoop-outputformats

2017-11-22 15:21 GMT+01:00 ebru <[hidden email]>:
Hello all,

We are trying to write dataset as parquet format, we use AvroParquetOutputFormat but it is not compatible with Flink’s FileOutputFormat.

Is there a way to write dataset as parquet?

-Ebru

Reply | Threaded
Open this post in threaded view
|

Re: How to write dataset as parquet format

ebru
Flavio and Fabian thanks for your quick answers, it was very helpful.

-Ebru
On 22 Nov 2017, at 20:47, Flavio Pompermaier <[hidden email]> wrote:


On 22 Nov 2017 18:29, "Fabian Hueske" <[hidden email]> wrote:
Hi Ebru,

AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to wrap AvroParquetOutputFormat in Flink's HadoopOutputFormat.

Hope this helps,
Fabian

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/hadoop_compatibility.html#using-hadoop-outputformats

2017-11-22 15:21 GMT+01:00 ebru <[hidden email]>:
Hello all,

We are trying to write dataset as parquet format, we use AvroParquetOutputFormat but it is not compatible with Flink’s FileOutputFormat.

Is there a way to write dataset as parquet?

-Ebru