(DEPRECATED) Apache Flink User Mailing List archive.

Re: Parquet example

Posted by Fabian Hueske on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Parquet-example-tp380p398.html

Hi,

just want to let you know, that we opened a JIRA (FLINK-1236) to support local split assignment for the HadoopInputFormat.

At least this performance issue should be easy to solve :-)

2014-11-11 12:44 GMT+01:00 Fabian Hueske <[hidden email]>:

First of all, split locality can make a huge difference.
It will also enable a tighter integration, API-wise as well for the execution by pushing for example filters or projections directly into the data source and therefore reduce the data to be read from the file system.

2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Maybe this is a dumb question but could you explain me what are the benefits of a dedicated Flink IF vs the one available by default in Hadoop IF wrapper?
Is it just because of data locality of task slots?

On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <[hidden email]> wrote:
Hi Flavio,

I am not aware of a Flink InputFormat for Parquet. However, it should be hopefully covered by the Hadoop IF wrapper.
A dedicated Flink IF would be great though, IMO.

Best, Fabian

2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <[hidden email]>:
Hi to all,

I'd like to know whether Flink is able exploit Parquet format to read data efficiently from HDFS.
Is there any example available?

Bets,
Flavio