Hi,
I am trying some initial samples with flink. I have one doubt regarding data types. Flink support data types Tuple(max 25 fields), Java POJOs, Primitive types, Regular classes etc., In my case I do not have fixed type. I have meta data with filed names & its types. For ex., (Id:int, Name:String, Salary:Double, Dept:String,... etc). I do not know the number of fields, its names or types till I receive metadata. In these what should be the source type I should go with? Please suggest. Small example would be of great help. Scenario trying to solve : Input : Metadata : {"id":"int", "Name":"String","Salary":"Double","Dept":"String"} Data file : csv data file with above fields data Output required is : Calculate average of salary by department wise. -- Thank you, Madan. |
Hi,
For truly dynamic class you would need a custom TypeInformation or TypeDeserializationSchema and store the fields on some kind of Map<String, String>. Maybe something could be done with inheritance if records that always share the same fields could be deserialized to some specific class with fixed/predefinied fields. However in your case it seems like you can ignore all of the dynamic fields, and just implement a deserializer that skips/ignores all of the field except of Dept and Salary. It could produce simple POJO with those two fields or a even Touple2<String, Double>. If those fields are missing, set them to null and discard/filter out the record, since you will not be able to use it for calculating your average anyway. Piotrek > On 11 Dec 2017, at 16:13, madan <[hidden email]> wrote: > > Hi, > > I am trying some initial samples with flink. I have one doubt regarding data types. Flink support data types Tuple(max 25 fields), Java POJOs, Primitive types, Regular classes etc., > In my case I do not have fixed type. I have meta data with filed names & its types. For ex., (Id:int, Name:String, Salary:Double, Dept:String,... etc). I do not know the number of fields, its names or types till I receive metadata. In these what should be the source type I should go with? Please suggest. Small example would be of great help. > > > Scenario trying to solve : > > Input : > Metadata : {"id":"int", "Name":"String","Salary":"Double","Dept":"String"} > Data file : csv data file with above fields data > > Output required is : Calculate average of salary by department wise. > > > -- > Thank you, > Madan. |
In reply to this post by madan
We’ve been using genericRecords with custom serializers to do exactly this. We need to run the same flink pipeline for 10s of thousands of different schemas
for our use cases and code gening or building that many different jars just isn’t practical. From: madan [mailto:[hidden email]]
Hi, I am trying some initial samples with flink. I have one doubt regarding data types. Flink support data types Tuple(max 25 fields), Java POJOs, Primitive types, Regular classes etc., In my case I do not have fixed type. I have meta data with filed names & its types. For ex., (Id:int, Name:String, Salary:Double, Dept:String,... etc). I do not know the number of fields, its names or types till
I receive metadata. In these what should be the source type I should go with? Please suggest. Small example would be of great help. Scenario trying to solve : Input : Metadata : {"id":"int", "Name":"String","Salary":"Double","Dept":"String"} Data file : csv data file with above fields data Output required is : Calculate average of salary by department wise. -- Thank you, |
Free forum by Nabble | Edit this page |