Hi all, When using the parseQuotedStrings function for the CsvReader class, I have noticed that if the caracter of the quotes is also inside of the string, the parsing fails. "RT @sportsguy33: New Time Warner slogan: "Time Warner, where we make you long for the days before cable."" RT @sportsguy33: New Time Warner slogan: "Time Warner, where we make you long for the days before cable." I have found the part of the Flink code that raised this exception and can fix it, but wanted to consult first if others agree that this is an issue. Cheers, Tamara |
Hi Tamara, Quoted strings should not contain the quoting character. The way to work around this is to escape the quote characters. However, currently there is no option to escape quotes which pretty much forbids any use of quote characters within quoted fields. This should be fixed. I opened a JIRA for this issue: https://issues.apache.org/jira/browse/FLINK-2567While your proposal is a very convenient feature, I think we should rather implement explicit quoting for performance and clarity reasons. On Mon, Aug 24, 2015 at 10:40 AM, Tamara Mendt <[hidden email]> wrote:
|
Thank you Maximilian, I agree and would be happy to fix this issue. On Mon, Aug 24, 2015 at 11:50 AM, Maximilian Michels <[hidden email]> wrote:
-- Tamara Mendt
|
Be aware that the CSV input format extends the delimited input format. The delimited input format splits at the line delimiter (such as \n) without awareness of quotes. So that character can never be part of a quote... On Mon, Aug 24, 2015 at 11:55 AM, Tamara Mendt <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |