Compression - AvroOutputFormat and over network ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Compression - AvroOutputFormat and over network ?

Tarandeep Singh
Hi,

How can I set compression for AvroOutputFormat when writing files on HDFS?
Also, can we set compression for intermediate data that is sent over network (from map to reduce phase) ?

Thanks,
Tarandeep
Reply | Threaded
Open this post in threaded view
|

Re: Compression - AvroOutputFormat and over network ?

Ufuk Celebi
Hey Tarandeep,

regarding the network part: not possible at the moment. It's pretty
straight forward to add support for it, but no one ever got around to
actually implementing it. If you would like to contribute, I am happy
to give some hints about which parts of the system would need to be
modified.

– Ufuk


On Mon, Apr 18, 2016 at 12:56 PM, Tarandeep Singh <[hidden email]> wrote:
> Hi,
>
> How can I set compression for AvroOutputFormat when writing files on HDFS?
> Also, can we set compression for intermediate data that is sent over network
> (from map to reduce phase) ?
>
> Thanks,
> Tarandeep
Reply | Threaded
Open this post in threaded view
|

Re: Compression - AvroOutputFormat and over network ?

rmetzger0
Hi Tarandeep,

I think for that you would need to set a codec factory on the DataFileWriter. Sadly we don't expose that method to the user.

If you want, you can contribute this change to Flink. Otherwise, I can quickly fix it.

Regards,
Robert


On Mon, Apr 18, 2016 at 2:36 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Tarandeep,

regarding the network part: not possible at the moment. It's pretty
straight forward to add support for it, but no one ever got around to
actually implementing it. If you would like to contribute, I am happy
to give some hints about which parts of the system would need to be
modified.

– Ufuk


On Mon, Apr 18, 2016 at 12:56 PM, Tarandeep Singh <[hidden email]> wrote:
> Hi,
>
> How can I set compression for AvroOutputFormat when writing files on HDFS?
> Also, can we set compression for intermediate data that is sent over network
> (from map to reduce phase) ?
>
> Thanks,
> Tarandeep

Reply | Threaded
Open this post in threaded view
|

Re: Compression - AvroOutputFormat and over network ?

Tarandeep Singh
Avro changes look easy. I think I can make those changes.
To make changes to network data, I need some directions.

@Ufuk please point me to corresponding code.

thanks,
Tarandeep

On Mon, Apr 18, 2016 at 11:05 AM, Robert Metzger <[hidden email]> wrote:
Hi Tarandeep,

I think for that you would need to set a codec factory on the DataFileWriter. Sadly we don't expose that method to the user.

If you want, you can contribute this change to Flink. Otherwise, I can quickly fix it.

Regards,
Robert


On Mon, Apr 18, 2016 at 2:36 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Tarandeep,

regarding the network part: not possible at the moment. It's pretty
straight forward to add support for it, but no one ever got around to
actually implementing it. If you would like to contribute, I am happy
to give some hints about which parts of the system would need to be
modified.

– Ufuk


On Mon, Apr 18, 2016 at 12:56 PM, Tarandeep Singh <[hidden email]> wrote:
> Hi,
>
> How can I set compression for AvroOutputFormat when writing files on HDFS?
> Also, can we set compression for intermediate data that is sent over network
> (from map to reduce phase) ?
>
> Thanks,
> Tarandeep