when run progrm on big data customer 2.5GB orders 5GB disply error why
DataSource (at getCustomerDataSet(TPCHQuery3.java:252) (org.apache.flink.api.java.io.CsvInputFormat)) (1/1) switched to FAILED org.apache.flink.api.common.io.ParseException: Row too short: 14999999|Customer#014999999|3emQ49UZtlfjeK at org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:283) at org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:236) at org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:42) at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:489) at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:204) at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:42) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:148) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) at java.lang.Thread.run(Thread.java:724) ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Customer> customers = getCustomerDataSet(env,mask); DataSet<Orders> orders= getOrdersDataSet(env,maskorder); DefaultJoin<Customer,Orders> result = customers.join(orders).where(0).equalTo(1); result.print(); result .writeAsCsv("/home/hadoop/Desktop/Dataset/inoptmization.csv", "\n", "|",WriteMode.OVERWRITE); env.execute(); } public static class Customer extends Tuple3<Long,String,String> { } public static class Orders extends Tuple2<Long,Long> { } private static DataSet<Customer> getCustomerDataSet(ExecutionEnvironment env,String mask) { return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv") .fieldDelimiter('|') .includeFields(mask).ignoreFirstLine() .tupleType(Customer.class); } |
Hi, the problem is that Flink is trying to parse the input data as CSV, but there seem to be rows in the data which do not conform to the specified schema. On Thu, Jun 18, 2015 at 12:51 PM, hagersaleh <[hidden email]> wrote: when run progrm on big data customer 2.5GB orders 5GB disply error why |
Free forum by Nabble | Edit this page |