Explanation on limitations of the Flink Table API

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Explanation on limitations of the Flink Table API

Simone Robutti
Hello,

I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. 

To my knowledge this is not possible but I would like to know if I'm missing something. My requirement is to be able to read a CSV file and manipulate it reading the field names from the file and inferring data types. 

Thanks,

Simone
Reply | Threaded
Open this post in threaded view
|

Re: Explanation on limitations of the Flink Table API

Fabian Hueske-2
Hi Simone,

in Flink 1.0.x, the Table API does not support reading external data, i.e., it is not possible to read a CSV file directly from the Table API.
Tables can only be created from DataSet or DataStream which means that the data is already converted into "Flink types".

However, the Table API is currently under heavy development as part of the the efforts to add SQL support.
This work is taking place on the master branch and I am currently working on interfaces to scan external data sets or ingest external data streams.
The interface will be quite generic such that it should be possible to define a table source that reads the first lines of a file to infer attribute names and types.
You can have a look at the current state of the API design here [1].

Feedback is welcome and can be very easily included in this phase of the development ;-)

Cheers, Fabian

2016-04-21 14:26 GMT+02:00 Simone Robutti <[hidden email]>:
Hello,

I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. 

To my knowledge this is not possible but I would like to know if I'm missing something. My requirement is to be able to read a CSV file and manipulate it reading the field names from the file and inferring data types. 

Thanks,

Simone

Reply | Threaded
Open this post in threaded view
|

Re: Explanation on limitations of the Flink Table API

Flavio Pompermaier
We're also trying to work around the current limitations of Table API and we're reading DataSets with on-purpose input formats that returns a POJO Row containing the list of values (but we're reading all values as String...). 
Actually we would also need a way to abstract the composition of Flink operators and UDFs to compose a transformation from a Graphical UI or from a script..during the Stratosphere project there was Meteor and Supremo allowing that [1] but then it was dismissed in favour of Pig integration that I don't wheter it was ever completed..some days ago I discovered Piglet project[2] that allows to use PIG with Spark and Flink but I don't know how well it works (Flink integration is also very recent and not documented anywhere).

Best,
Flavio


On Thu, Apr 21, 2016 at 2:41 PM, Fabian Hueske <[hidden email]> wrote:
Hi Simone,

in Flink 1.0.x, the Table API does not support reading external data, i.e., it is not possible to read a CSV file directly from the Table API.
Tables can only be created from DataSet or DataStream which means that the data is already converted into "Flink types".

However, the Table API is currently under heavy development as part of the the efforts to add SQL support.
This work is taking place on the master branch and I am currently working on interfaces to scan external data sets or ingest external data streams.
The interface will be quite generic such that it should be possible to define a table source that reads the first lines of a file to infer attribute names and types.
You can have a look at the current state of the API design here [1].

Feedback is welcome and can be very easily included in this phase of the development ;-)

Cheers, Fabian

2016-04-21 14:26 GMT+02:00 Simone Robutti <[hidden email]>:
Hello,

I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. 

To my knowledge this is not possible but I would like to know if I'm missing something. My requirement is to be able to read a CSV file and manipulate it reading the field names from the file and inferring data types. 

Thanks,

Simone



Reply | Threaded
Open this post in threaded view
|

Re: Explanation on limitations of the Flink Table API

Simone Robutti
In reply to this post by Fabian Hueske-2
Thanks for all your input.

The design document covers the use cases we have in mind and querying external sources may be interesting to us for other uses not mentioned in the first mail. 

I will wait for developments in this direction, because the expected result seems promising. :)

Thank you again,

Simone

2016-04-21 14:41 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Simone,

in Flink 1.0.x, the Table API does not support reading external data, i.e., it is not possible to read a CSV file directly from the Table API.
Tables can only be created from DataSet or DataStream which means that the data is already converted into "Flink types".

However, the Table API is currently under heavy development as part of the the efforts to add SQL support.
This work is taking place on the master branch and I am currently working on interfaces to scan external data sets or ingest external data streams.
The interface will be quite generic such that it should be possible to define a table source that reads the first lines of a file to infer attribute names and types.
You can have a look at the current state of the API design here [1].

Feedback is welcome and can be very easily included in this phase of the development ;-)

Cheers, Fabian

2016-04-21 14:26 GMT+02:00 Simone Robutti <[hidden email]>:
Hello,

I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. 

To my knowledge this is not possible but I would like to know if I'm missing something. My requirement is to be able to read a CSV file and manipulate it reading the field names from the file and inferring data types. 

Thanks,

Simone