Reading ORC format on Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading ORC format on Flink

Philip Lee
Hello, 

Question about reading ORC format on Flink.

I want to use dataset after loadtesting csv to orc format by Hive.
Can Flink support reading ORC format?

If so, please let me know how to use the dataset in Flink.

Best,
Phil




Reply | Threaded
Open this post in threaded view
|

Re: Reading ORC format on Flink

Chiwan Park-2
Hi Phil,

I think that you can read ORC file using OrcInputFormat [1] with readHadoopFile method.

There is an example on MapReduce [2] in Stackoveflow. The approach works also on Flink. Maybe you have to use RichMapFunction [3] to initialize OrcSerde and StructObjectInspector object.

Regards,
Chiwan Park

[1]: https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.html
[2]: http://stackoverflow.com/questions/22673222/how-do-you-use-orcfile-input-output-format-in-mapreduce
[3]: https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/functions/RichMapFunction.html

> On Jan 28, 2016, at 4:44 AM, Philip Lee <[hidden email]> wrote:
>
> Hello,
>
> Question about reading ORC format on Flink.
>
> I want to use dataset after loadtesting csv to orc format by Hive.
> Can Flink support reading ORC format?
>
> If so, please let me know how to use the dataset in Flink.
>
> Best,
> Phil
>
>
>
>