I'm wanting to know if it's possible in Flink to parse strings into a dynamic JSON object that doesn't require me to know the primitive type details at compile time.
We have over 300 event types to process, and I need a way to load the types at runtime. I only need to know if certain fields exist on the incoming objects, and the object schemas are all different except for certain fields. Every example I can find shows Flink users specifying the full type information at compile time, but there's no way this will scale. It's possible for us to lookup primitive type details at runtime from JSON, but I'll still need a way to process that JSON in Flink to extract the metadata if it's required. So, that brings me back to the original issue. How can I do this in Flink? -- Devin G. Bost |
Hi Devin,
Flink supports arbitrary data types. You can simply read the JSON object as a big string first and process the individual event types in a UDF using e.g. the Jackson library. Are you using SQL or DataStream API? An alternative is to set the "fail-on-missing-field" flag to false. This allows you to read a field if it is present otherwise it will be null. You can then simply list all fields you are interested in from all different event types. I hope this helps. Regards, Timo On 28.01.21 05:44, Devin Bost wrote: > I'm wanting to know if it's possible in Flink to parse strings into a > dynamic JSON object that doesn't require me to know the primitive type > details at compile time. > We have over 300 event types to process, and I need a way to load the > types at runtime. I only need to know if certain fields exist on the > incoming objects, and the object schemas are all different except for > certain fields. > Every example I can find shows Flink users specifying the full type > information at compile time, but there's no way this will scale. > > It's possible for us to lookup primitive type details at runtime from > JSON, but I'll still need a way to process that JSON in Flink to extract > the metadata if it's required. So, that brings me back to the original > issue. > > How can I do this in Flink? > > -- > Devin G. Bost |
In reply to this post by devinbost
Flink needs to know upfront what kind
of types it deals with to setup the serialization stack between
operators.
As such, generally speaking, you will
have to use some generic container for transmitting data (e.g., a
String or a Jackson ObjectNode) and either work on them directly
or
map them as required to specific type within
the scope of a single function based on some custom logic.
There may be other approaches, but we'd
have to know more about the specific use-case and requirements to
help you there (e.g., what does your user interact with).
My understanding is that you have some
single source for all these events, and now you want some user to
define a pipeline processing a specific subset of these events?
On 1/28/2021 5:44 AM, Devin Bost wrote:
|
Free forum by Nabble | Edit this page |