Process parquet files in batch mode with blink planner

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Process parquet files in batch mode with blink planner

olivier_brazet
Hi community,

For a PoC I need to process some parquet files in batch mode.

I managed to implement some processing using the DataSet API. It is working fine.
Now, I would like to test the SQL API and the blink planner.

If I do well understand, the ParquetTableSource is not compatible with the blink planner. Thus I am wondering if there is a TableSource compatible with the blink planner which can be used to read parquet files and if there are some examples available.

Thanks,

Olivier
Reply | Threaded
Open this post in threaded view
|

Re: Process parquet files in batch mode with blink planner

Jingsong Li
Hi olivier,

Sorry for the late reply.
In blink planner, 
- only hive parquet table can be read now.
- If you want to support native parquet files, you can modify `ParquetTableSource` a little bit, extends StreamTableSource.

Best,
Jingsong Lee

On Wed, Feb 26, 2020 at 7:50 PM <[hidden email]> wrote:
Hi community,

For a PoC I need to process some parquet files in batch mode.

I managed to implement some processing using the DataSet API. It is working fine.
Now, I would like to test the SQL API and the blink planner.

If I do well understand, the ParquetTableSource is not compatible with the blink planner. Thus I am wondering if there is a TableSource compatible with the blink planner which can be used to read parquet files and if there are some examples available.

Thanks,

Olivier


--
Best, Jingsong Lee