CREATE TABLE with Schema derived from format

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

CREATE TABLE with Schema derived from format

Gyula Fóra
Hi All!

I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns.

The format generally defines the available columns so maybe we could simply use them as is if we want.

This would be very helpful when exploring different data sources.

Let me know what you think!
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: CREATE TABLE with Schema derived from format

Jark Wu-3
Hi Gyula,

That's a good point and is on the roadmap. 

In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10.

On the contrary, we are planning to derive table schema from format schema if it is specified, e.g. "format.fields", "format.avro-file-path". 
Furthermore, table schema can be inferenced if there is a schema registry or even read some data and infer it. 
I created FLINK-16420 to track this effort. But not sure we have enough time to support it before 1.11. 

Best,
Jark



On Wed, 4 Mar 2020 at 18:21, Gyula Fóra <[hidden email]> wrote:
Hi All!

I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns.

The format generally defines the available columns so maybe we could simply use them as is if we want.

This would be very helpful when exploring different data sources.

Let me know what you think!
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: CREATE TABLE with Schema derived from format

Gyula Fóra
Hi Jark,

Thank you for the clarification this is exactly what I was looking for, especially for the second part regarding schema registry integration.

This question came up as we were investigating how the schema registry integration should look like :)

Cheers,
Gyula

On Wed, Mar 4, 2020 at 12:06 PM Jark Wu <[hidden email]> wrote:
Hi Gyula,

That's a good point and is on the roadmap. 

In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10.

On the contrary, we are planning to derive table schema from format schema if it is specified, e.g. "format.fields", "format.avro-file-path". 
Furthermore, table schema can be inferenced if there is a schema registry or even read some data and infer it. 
I created FLINK-16420 to track this effort. But not sure we have enough time to support it before 1.11. 

Best,
Jark



On Wed, 4 Mar 2020 at 18:21, Gyula Fóra <[hidden email]> wrote:
Hi All!

I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns.

The format generally defines the available columns so maybe we could simply use them as is if we want.

This would be very helpful when exploring different data sources.

Let me know what you think!
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: CREATE TABLE with Schema derived from format

Jark Wu-3
Yes. From my perspective, deriving schema from schema registry is the most important use case of FLINK-16420.

Some initial idea about this:
1) introduce a SchemaRegisteryCatalog to allow users run queries on existing topics without manual table definition. see FLINK-12256
2) provide a connector property for schema registery url to derive schema from it, and the CREATE TABLE statement can leave out schema part, e.g.

CREATE TABLE user_behavior WITH ("connector"="kafka", "topic"="user_behavior", "schema.registery.url"="localhost:8081")

Which way are you looking for?

Best,
Jark

On Wed, 4 Mar 2020 at 19:09, Gyula Fóra <[hidden email]> wrote:
Hi Jark,

Thank you for the clarification this is exactly what I was looking for, especially for the second part regarding schema registry integration.

This question came up as we were investigating how the schema registry integration should look like :)

Cheers,
Gyula

On Wed, Mar 4, 2020 at 12:06 PM Jark Wu <[hidden email]> wrote:
Hi Gyula,

That's a good point and is on the roadmap. 

In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10.

On the contrary, we are planning to derive table schema from format schema if it is specified, e.g. "format.fields", "format.avro-file-path". 
Furthermore, table schema can be inferenced if there is a schema registry or even read some data and infer it. 
I created FLINK-16420 to track this effort. But not sure we have enough time to support it before 1.11. 

Best,
Jark



On Wed, 4 Mar 2020 at 18:21, Gyula Fóra <[hidden email]> wrote:
Hi All!

I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns.

The format generally defines the available columns so maybe we could simply use them as is if we want.

This would be very helpful when exploring different data sources.

Let me know what you think!
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: CREATE TABLE with Schema derived from format

Gyula Fóra
Hi!

Initially we were looking at 2) but 1) would be the best solution. I think both are would be very valuable.

My only concern related to using the Schema Registry as a Catalog is the interaction with other Catalogs in the system. Maybe you are using a Hive catalog to track a bunch of tables, and now you would have to switch to the Schema Registry.
Maybe in this case it would be good to be able to import tables from one catalog to another.

Gyula


On Wed, Mar 4, 2020 at 2:24 PM Jark Wu <[hidden email]> wrote:
Yes. From my perspective, deriving schema from schema registry is the most important use case of FLINK-16420.

Some initial idea about this:
1) introduce a SchemaRegisteryCatalog to allow users run queries on existing topics without manual table definition. see FLINK-12256
2) provide a connector property for schema registery url to derive schema from it, and the CREATE TABLE statement can leave out schema part, e.g.

CREATE TABLE user_behavior WITH ("connector"="kafka", "topic"="user_behavior", "schema.registery.url"="localhost:8081")

Which way are you looking for?

Best,
Jark

On Wed, 4 Mar 2020 at 19:09, Gyula Fóra <[hidden email]> wrote:
Hi Jark,

Thank you for the clarification this is exactly what I was looking for, especially for the second part regarding schema registry integration.

This question came up as we were investigating how the schema registry integration should look like :)

Cheers,
Gyula

On Wed, Mar 4, 2020 at 12:06 PM Jark Wu <[hidden email]> wrote:
Hi Gyula,

That's a good point and is on the roadmap. 

In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10.

On the contrary, we are planning to derive table schema from format schema if it is specified, e.g. "format.fields", "format.avro-file-path". 
Furthermore, table schema can be inferenced if there is a schema registry or even read some data and infer it. 
I created FLINK-16420 to track this effort. But not sure we have enough time to support it before 1.11. 

Best,
Jark



On Wed, 4 Mar 2020 at 18:21, Gyula Fóra <[hidden email]> wrote:
Hi All!

I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns.

The format generally defines the available columns so maybe we could simply use them as is if we want.

This would be very helpful when exploring different data sources.

Let me know what you think!
Gyula