Hi, I load an Avro file in a Flink Dataset: AvroInputFormat<GenericRecord> test = new AvroInputFormat<GenericRecord>( and here are the results of printing DS: {"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t"} {"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns. blithely bold courts among the closely regular packages use furiously bold platelets?"} {"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s. ironic, unusual asymptotes wake blithely r"} {"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT": "platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun"} {"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"} {"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account"} {"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT": "ts. silent requests haggle. closely express packages sleep across the blithely"} Now I want to create a table from DS Dataset with the exactly the same schema of Avro file, I mean columns should be N_NATIONKEY, N_NAME, N_REGIONKEY, and N_COMMENT. I know using the line: tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ..."); I can create a table and set the columns, but I want the columns to be inferred automatically from data. Is it possible? I tried tableEnv.registerDataSet("tbTest", DS); but it creates a table with the schema: root |-- f0: GenericType<org.apache.avro.generic.GenericRecord> |
+ Flink Users
From: Yun Tang <[hidden email]>
Sent: Monday, January 28, 2019 19:46 To: Soheil Pourbafrani Subject: Re: How to infer table schema from Avro file
Hi Soheil
You should provide your generated Avro record class as the type of AvroInputFormat not Avro's GenericRecord class. Take an example, if your generated record named 'Nation', the correct way to create input should be:
AvroInputFormat<Nation> test = new AvroInputFormat<>(
By doing this, Flink would recognize your input format as 'PojoType' not 'GenericType' which only has one field. And the field of columns would be inferred automatically
Best
Yun Tang
From: Soheil Pourbafrani <[hidden email]>
Sent: Monday, January 28, 2019 5:54 To: user Subject: How to infer table schema from Avro file Hi, I load an Avro file in a Flink Dataset:
AvroInputFormat<GenericRecord> test = new AvroInputFormat<GenericRecord>( and here are the results of printing DS:
{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns. blithely bold courts among the closely regular packages use furiously bold platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s. ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT": "platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT": "ts. silent requests haggle. closely express packages sleep across the blithely"}
Now I want to create a table from DS Dataset with the exactly the same schema of Avro file, I mean columns should be N_NATIONKEY, N_NAME, N_REGIONKEY, and N_COMMENT.
I know using the line:
tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ..."); I can create a table and set the columns, but I want the columns to be inferred automatically from data. Is it possible?
I tried
tableEnv.registerDataSet("tbTest", DS); but it creates a table with the schema:
root
|-- f0: GenericType<org.apache.avro.generic.GenericRecord>
|
Free forum by Nabble | Edit this page |