Hi to all,
I'd like to know whether Flink is able exploit Parquet format to read data efficiently from HDFS. Is there any example available? Bets, Flavio |
Hi Flavio, I am not aware of a Flink InputFormat for Parquet. However, it should be hopefully covered by the Hadoop IF wrapper.2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Maybe this is a dumb question but could you explain me what are the benefits of a dedicated Flink IF vs the one available by default in Hadoop IF wrapper? Is it just because of data locality of task slots? On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <[hidden email]> wrote:
|
First of all, split locality can make a huge difference. It will also enable a tighter integration, API-wise as well for the execution by pushing for example filters or projections directly into the data source and therefore reduce the data to be read from the file system.2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Hi, just want to let you know, that we opened a JIRA (FLINK-1236) to support local split assignment for the HadoopInputFormat. At least this performance issue should be easy to solve :-) 2014-11-11 12:44 GMT+01:00 Fabian Hueske <[hidden email]>:
|
Yes I've read it!Will it support also hbase tableInputFormat (HTable and Scan are no more serializable) ao basically the hbase addon becomes useless? On Nov 12, 2014 9:10 PM, "Fabian Hueske" <[hidden email]> wrote:
|
I guess this depends on how the Flink TableInputFormat is implemented. In its current state, the TableInputFormat returns a key-value pair just like the Hadoop HBase IF does. A Flink HBase IF could for example also unpack the HBase result object into a tuple of column values depending on the HBase query. 2014-11-12 21:17 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Free forum by Nabble | Edit this page |