Inserting nullable data into NOT NULL columns

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Inserting nullable data into NOT NULL columns

Gyula Fóra
Hi All!

We ran into a problem while trying to insert data read from kafka into a table sink where some of the columns are not nullable.

The problem is that from Kafka we can only read nullable columns in JSON format otherwise you get the following error: 
org.apache.flink.table.api.ValidationException: Type STRING NOT NULL of table field 'first' does not match with the physical type STRING of the 'first' field of the TableSource return type.
On the other hand no matter what we do with the nullable column (things like select where ... is not null) the type will always STRING and will be incompatible with the sink leading to the following error:

Query schema: [first: STRING, ...]
Sink schema: [first: STRING NOT NULL, ...]
Any idea on how to resolve this type mismatch between nullable and non-nullable data? I feel that a query like (select x from y where x is not null ) should change the type to not null.

Thanks 
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Inserting nullable data into NOT NULL columns

Timo Walther
Hi Gyula,

some disclaimer: the type system rework is still ongoing and there a a
couple of known issues and missing end-to-end tests around this topic.

I would therefore recommend to declare the sink as `STRING NULL` for now.

Can you open an issue for your concrete use case with some example
source/query/sink SQL and I'm happy to look into this.

Actually, `NULLIF()` should do the trick in the query but unfortunately
the current Calcite behavior is not what one would expect.

Thanks,
Timo


On 09.04.20 15:53, Gyula Fóra wrote:

> Hi All!
>
> We ran into a problem while trying to insert data read from kafka into a
> table sink where some of the columns are not nullable.
>
> The problem is that from Kafka we can only read nullable columns in JSON
> format otherwise you get the following error:
>
> org.apache.flink.table.api.ValidationException: Type STRING NOT NULL of
> table field 'first' does not match with the physical type STRING of the
> 'first' field of the TableSource return type.
>
> On the other hand no matter what we do with the nullable column (things
> like select where ... is not null) the type will always STRING and will
> be incompatible with the sink leading to the following error:
>
> Query schema: [first: STRING, ...]
> Sink schema: [first: STRING NOT NULL, ...]
>
> Any idea on how to resolve this type mismatch between nullable and
> non-nullable data? I feel that a query like (select x from y where x is
> not null ) should change the type to not null.
>
> Thanks
> Gyula

Reply | Threaded
Open this post in threaded view
|

Re: Inserting nullable data into NOT NULL columns

Gyula Fóra
Thanks Timo for the answer!

We are specifically working on implementing Kudu Table connectors and catalog and we hit a problem while handling keys.
Naturally columns that will be used as keys in Kudu cannot be null so we set their type when the catalog returns it to NOT NULL. 

Unfortunately this made it nearly impossible to actually insert into those tables when getting data from Kafka because of this issue. 

As per your suggestion we will work around this by not setting the key column types to NOT NULL thus risking a runtime error for now. We expect to have a PR with the Kudu connectors to the Bahir project open next week, once that is merged and done I will open an issue here to document the shortcoming.

Gyula

On Thu, Apr 9, 2020 at 5:30 PM Timo Walther <[hidden email]> wrote:
Hi Gyula,

some disclaimer: the type system rework is still ongoing and there a a
couple of known issues and missing end-to-end tests around this topic.

I would therefore recommend to declare the sink as `STRING NULL` for now.

Can you open an issue for your concrete use case with some example
source/query/sink SQL and I'm happy to look into this.

Actually, `NULLIF()` should do the trick in the query but unfortunately
the current Calcite behavior is not what one would expect.

Thanks,
Timo


On 09.04.20 15:53, Gyula Fóra wrote:
> Hi All!
>
> We ran into a problem while trying to insert data read from kafka into a
> table sink where some of the columns are not nullable.
>
> The problem is that from Kafka we can only read nullable columns in JSON
> format otherwise you get the following error:
>
> org.apache.flink.table.api.ValidationException: Type STRING NOT NULL of
> table field 'first' does not match with the physical type STRING of the
> 'first' field of the TableSource return type.
>
> On the other hand no matter what we do with the nullable column (things
> like select where ... is not null) the type will always STRING and will
> be incompatible with the sink leading to the following error:
>
> Query schema: [first: STRING, ...]
> Sink schema: [first: STRING NOT NULL, ...]
>
> Any idea on how to resolve this type mismatch between nullable and
> non-nullable data? I feel that a query like (select x from y where x is
> not null ) should change the type to not null.
>
> Thanks
> Gyula