Hello,
As a new Flink user I wondered if there are any existing approaches or practices for reading file formats such as CSV, TSV, etc. as DataSets or POJOs? My current approach can be illustrated with a contrived example:
While such a mapping could be implemented in a more general form, I'm keen to avoid wheel reinvention and therefore wonder if there are already good ways of doing this? Thanks - Elliot. |
Hi Elliot, Right now there is no tooling support for reading CSV/TSV data into a POJO, but there is a pull request open where a user contributes such a feature: https://github.com/apache/flink/pull/426 So its probably only a matter of days until it is available in master. You can do it a bit easier by using env.readCsvFile(). It will do the parsing into the types for you. Sorry that the feature is not already available for you. Please let us know if you have more questions regarding Flink. Best, Robert On Thu, Mar 5, 2015 at 10:18 AM, Elliot West <[hidden email]> wrote:
|
Hi Elliot, right now, I see the following options to read CSV/TSV files: - Read CSV files (ExecutionEnvironment.readCsvFile()) into Tuples (max number of fields 25 for Java, 22 for Scala) and map Tuples to POJOs in a subsequent Map function (if necessary). I would recommend this approach, if the field limitation is not a problem for you. The CsvReader can be configured in several ways. For example record and field delimiters (',', '\t', ...) can be adapted. - Read the CSV file as text file (ExecutionEnvironment.readTextFile()) which gives you each line of a file as String. You can parse that line and create a POJO out of it in a subsequent Map function (just as you did in your example). This is more generic but leaves the parsing of the line up to you. See the DataSource documentation for details: 0.8.1: http://ci.apache.org/projects/flink/flink-docs-release-0.8/programming_guide.html#data-sources 0.9-SNAPSHOT: http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html#data-sinks Best, Fabian 2015-03-05 10:58 GMT+01:00 Robert Metzger <[hidden email]>:
|
Free forum by Nabble | Edit this page |