(DEPRECATED) Apache Flink User Mailing List archive.

Oracle 11g number serialization: classcast problem

Classic

List

Threaded

6 messages Options

Stefano Bortoli

Oracle 11g number serialization: classcast problem

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,

Stefano

Chesnay Schepler

Re: Oracle 11g number serialization: classcast problem

The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the
recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo
you pass when constructing the format.

On 21.03.2016 14:16, Stefano Bortoli wrote:

> Hi squirrels,
>
> I working on a flink job connecting to a Oracle DB. I started from the
> JDBC example for Derby, and used the TupleTypeInfo to configure the
> fields of the tuple as it is read.
>
> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR.
> Apparently, using Oracle, all the numbers are read as Double, causing
> a ClassCast exception. Of course I can fix it by changing the
> TupleTypeInfo, but I wonder whether there is some assumption for
> Oracle and Numbers.
>
> Thanks a lot for your support!
>
> saluti,
> Stefano

Stefano Bortoli

Re: Oracle 11g number serialization: classcast problem

I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double. The status of the JDBCInputFormat is not adequate for real world use case, as for example does not deal with NULL values.

However, with little effort we fixed few stuff and now we are getting to something usable. We are actually trying to do something a-la sqoop, therefore given a boundary query, we create the splits, and then assign it to the input format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it.

saluti,

Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <[hidden email]>:

The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo you pass when constructing the format.

On <a href="tel:21.03.2016%2014" value="+12103201614" target="_blank">21.03.2016 14:16, Stefano Bortoli wrote:

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano

Chesnay Schepler

Re: Oracle 11g number serialization: classcast problem

On 23.03.2016 10:04, Stefano Bortoli wrote:

I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double.

This sounds more like a bug actually. Feel free to open a JIRA for this.

The status of the JDBCInputFormat is not adequate for real world use case, as for example does not deal with NULL values.

This was already reported in FLINK-3471. To clarify, for NULL fields the format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty string or false; which actually isn't intended behavior, but caused by JDBC itself.

Defaulting to some value seems the only way to deal with this issue, since we can't store null in a Tuple.

I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default to, as such i didn't change them yet. I also just dislike the fact that a straight copy from A to B will not produce the same table.

However, with little effort we fixed few stuff and now we are getting to something usable. We are actually trying to do something a-la sqoop, therefore given a boundary query, we create the splits, and then assign it to the input format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it.

saluti,

Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <[hidden email]>:

The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo you pass when constructing the format.

On <a moz-do-not-send="true" href="tel:21.03.2016%2014" value="+12103201614" target="_blank">21.03.2016 14:16, Stefano Bortoli wrote:

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano

Chesnay Schepler

Re: Oracle 11g number serialization: classcast problem

On 23.03.2016 10:38, Chesnay Schepler wrote:

On 23.03.2016 10:04, Stefano Bortoli wrote:

I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double.

This sounds more like a bug actually. Feel free to open a JIRA for this.

Actually, this was done so that every field is a primitive or String.

The status of the JDBCInputFormat is not adequate for real world use case, as for example does not deal with NULL values.

This was already reported in FLINK-3471. To clarify, for NULL fields the format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty string or false; which actually isn't intended behavior, but caused by JDBC itself.

Defaulting to some value seems the only way to deal with this issue, since we can't store null in a Tuple.

I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default to, as such i didn't change them yet. I also just dislike the fact that a straight copy from A to B will not produce the same table.

However, with little effort we fixed few stuff and now we are getting to something usable. We are actually trying to do something a-la sqoop, therefore given a boundary query, we create the splits, and then assign it to the input format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it.

saluti,

Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <[hidden email]>:

The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo you pass when constructing the format.

On <a moz-do-not-send="true" href="tel:21.03.2016%2014" value="+12103201614" target="_blank">21.03.2016 14:16, Stefano Bortoli wrote:

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano

Stefano Bortoli

Re: Oracle 11g number serialization: classcast problem

In reply to this post by Chesnay Schepler

Thanks for the clarification.

case java.sql.Types.DECIMAL:
reuse.setField(resultSet.getBigDecimal(pos + 1).doubleValue(), pos);
break;

this causes both a nullpointer on null values as well as a double class cast exception when serializing the tuple.

For the moment, because we have mostly a 'reading problem', we modified the inputformat to get strings and we output them as CSV.

case java.sql.Types.NUMERIC:
                    if(resultSet.getBigDecimal(pos + 1)==null) reuse.setField("", pos);
                    else reuse.setField(resultSet.getBigDecimal(pos + 1).toPlainString(), pos);
                    break;

Another problem is that the reading is sequential and does not allow for splits. When we get a working version that is satisfying, we'll share the contribution. Our idea is to enable the execution of Sqoop scripts using Flink. We are testing it on a Oracle table of 11 billion records, but we did not get through a complete run. We are just at first prototype level, so there is surely some work to do. :-)

saluti,

Stefano

2016-03-23 10:38 GMT+01:00 Chesnay Schepler <[hidden email]>:

On 23.03.2016 10:04, Stefano Bortoli wrote:

I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double.

This sounds more like a bug actually. Feel free to open a JIRA for this.

The status of the JDBCInputFormat is not adequate for real world use case, as for example does not deal with NULL values.

This was already reported in FLINK-3471. To clarify, for NULL fields the format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty string or false; which actually isn't intended behavior, but caused by JDBC itself.

Defaulting to some value seems the only way to deal with this issue, since we can't store null in a Tuple.

I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default to, as such i didn't change them yet. I also just dislike the fact that a straight copy from A to B will not produce the same table.

However, with little effort we fixed few stuff and now we are getting to something usable. We are actually trying to do something a-la sqoop, therefore given a boundary query, we create the splits, and then assign it to the input format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it.

saluti,

Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <[hidden email]>:

The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo you pass when constructing the format.

On <a href="tel:21.03.2016%2014" value="+12103201614" target="_blank">21.03.2016 14:16, Stefano Bortoli wrote:

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano