Hi,
I’m trying to read Parquet/Hive data using parquet’s ParquetInputFormat and hive’s DataWritableReadSupport.
I have an error when the TupleSerializer tries to create an instance of ArrayWritable, using reflection because ArrayWritable has no no-args constructor.
I’ve been able to make it work when executing in a local cluster by copying the ArrayWritable class in my own sources and adding the constructor. I guess that the classpath built by maven puts my code first and allows
me to override the original class. However when running into the real cluster (yarn@cloudera) the exception comes back (I guess that the original class is first in the classpath).
So you have an idea of how I could make it work ?
I’m think I’m tied to the ArrayWritable type because of the DataWritableReadSupport that extends ReadSupport<ArrayWritable>.
Would it be possible (and not too complicated) to make a DataSource that would not generate Tuples and allow me to convert the ArrayWritable to a more friendly type like String[] … Or if you have any other idea, they
are welcome !
B.R.
Gwenhaƫl PASQUIERS
Free forum by Nabble | Edit this page |