I've been trying out the Table API for some ETL using a two-stage job of CsvTableSink (DataSet) -> CsvInputFormat (Stream). I ran into an issue where the first stage produces output with trailing null values (valid), which causes a parse error in the second stage.
Looking at RowCsvInputFormatTest.java, I noticed that it expects input lines with a trailing delimiter, eg. "a|b|c|". Meanwhile, the CsvTableSink creates rows in the form of "a|b|c". As long as 'c' is present, this input does get successfully parsed by the RowCsvInputFormat. However, if 'c' is defined as a number and missing, eg. the row is "a|b|", the Number parser will fail on the empty string.
Is there something I am missing, or is there, in fact, an inconsistency between the TableSink and the InputFormat?