Reading null value from datasets

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading null value from datasets

Guido
Hello,
I would like to ask if there were any particular ways to read or treat null (e.g. Name, Lastname,, Age..) value in a dataset using readCsvFile, without being forced to ignore them.

Thanks for your time.
Guido

Reply | Threaded
Open this post in threaded view
|

Re: Reading null value from datasets

Maximilian Michels
Hi Guido,

This depends on your use case but you may read those values as type String and treat them accordingly.

Cheers,
Max

On Fri, Oct 23, 2015 at 1:59 PM, Guido <[hidden email]> wrote:
Hello,
I would like to ask if there were any particular ways to read or treat null (e.g. Name, Lastname,, Age..) value in a dataset using readCsvFile, without being forced to ignore them.

Thanks for your time.
Guido


Reply | Threaded
Open this post in threaded view
|

Re: Reading null value from datasets

Shiti Saxena
For a similar problem where we wanted to preserve and track null entries, we load the CSV as a DataSet[Array[Object]] and then transform it into DataSet[Row] using a custom RowSerializer(https://gist.github.com/Shiti/d0572c089cc08654019c) which handles null.  

The Table API(which supports null) can then be used on the resulting DataSet[Row]. 

On Fri, Oct 23, 2015 at 7:40 PM, Maximilian Michels <[hidden email]> wrote:
Hi Guido,

This depends on your use case but you may read those values as type String and treat them accordingly.

Cheers,
Max

On Fri, Oct 23, 2015 at 1:59 PM, Guido <[hidden email]> wrote:
Hello,
I would like to ask if there were any particular ways to read or treat null (e.g. Name, Lastname,, Age..) value in a dataset using readCsvFile, without being forced to ignore them.

Thanks for your time.
Guido



Reply | Threaded
Open this post in threaded view
|

Re: Reading null value from datasets

Maximilian Michels
As far as I know the null support was removed from the Table API because its support was consistently supported with all operations. See https://issues.apache.org/jira/browse/FLINK-2236


On Fri, Oct 23, 2015 at 7:20 PM, Shiti Saxena <[hidden email]> wrote:
For a similar problem where we wanted to preserve and track null entries, we load the CSV as a DataSet[Array[Object]] and then transform it into DataSet[Row] using a custom RowSerializer(https://gist.github.com/Shiti/d0572c089cc08654019c) which handles null.  

The Table API(which supports null) can then be used on the resulting DataSet[Row]. 

On Fri, Oct 23, 2015 at 7:40 PM, Maximilian Michels <[hidden email]> wrote:
Hi Guido,

This depends on your use case but you may read those values as type String and treat them accordingly.

Cheers,
Max

On Fri, Oct 23, 2015 at 1:59 PM, Guido <[hidden email]> wrote:
Hello,
I would like to ask if there were any particular ways to read or treat null (e.g. Name, Lastname,, Age..) value in a dataset using readCsvFile, without being forced to ignore them.

Thanks for your time.
Guido




Reply | Threaded
Open this post in threaded view
|

Re: Reading null value from datasets

Stephan Ewen
Hi Guido!

If you use Scala, I would use an Option to represent nullable fields. That is a very explicit solution that marks which fields can be null, and also forces the program to handle this carefully.

We are looking to add support for Java 8's Optional type as well for exactly that purpose.

Greetings,
Stephan


On Mon, Oct 26, 2015 at 10:27 AM, Maximilian Michels <[hidden email]> wrote:
As far as I know the null support was removed from the Table API because its support was consistently supported with all operations. See https://issues.apache.org/jira/browse/FLINK-2236


On Fri, Oct 23, 2015 at 7:20 PM, Shiti Saxena <[hidden email]> wrote:
For a similar problem where we wanted to preserve and track null entries, we load the CSV as a DataSet[Array[Object]] and then transform it into DataSet[Row] using a custom RowSerializer(https://gist.github.com/Shiti/d0572c089cc08654019c) which handles null.  

The Table API(which supports null) can then be used on the resulting DataSet[Row]. 

On Fri, Oct 23, 2015 at 7:40 PM, Maximilian Michels <[hidden email]> wrote:
Hi Guido,

This depends on your use case but you may read those values as type String and treat them accordingly.

Cheers,
Max

On Fri, Oct 23, 2015 at 1:59 PM, Guido <[hidden email]> wrote:
Hello,
I would like to ask if there were any particular ways to read or treat null (e.g. Name, Lastname,, Age..) value in a dataset using readCsvFile, without being forced to ignore them.

Thanks for your time.
Guido