Is Flink able to parse strings into dynamic JSON?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Is Flink able to parse strings into dynamic JSON?

devinbost
I'm wanting to know if it's possible in Flink to parse strings into a dynamic JSON object that doesn't require me to know the primitive type details at compile time. 
We have over 300 event types to process, and I need a way to load the types at runtime. I only need to know if certain fields exist on the incoming objects, and the object schemas are all different except for certain fields. 
Every example I can find shows Flink users specifying the full type information at compile time, but there's no way this will scale. 

It's possible for us to lookup primitive type details at runtime from JSON, but I'll still need a way to process that JSON in Flink to extract the metadata if it's required. So, that brings me back to the original issue. 

How can I do this in Flink? 

--
Devin G. Bost
Reply | Threaded
Open this post in threaded view
|

Re: Is Flink able to parse strings into dynamic JSON?

Timo Walther
Hi Devin,

Flink supports arbitrary data types. You can simply read the JSON object
as a big string first and process the individual event types in a UDF
using e.g. the Jackson library.

Are you using SQL or DataStream API?

An alternative is to set the "fail-on-missing-field" flag to false. This
allows you to read a field if it is present otherwise it will be null.
You can then simply list all fields you are interested in from all
different event types.

I hope this helps.

Regards,
Timo



On 28.01.21 05:44, Devin Bost wrote:

> I'm wanting to know if it's possible in Flink to parse strings into a
> dynamic JSON object that doesn't require me to know the primitive type
> details at compile time.
> We have over 300 event types to process, and I need a way to load the
> types at runtime. I only need to know if certain fields exist on the
> incoming objects, and the object schemas are all different except for
> certain fields.
> Every example I can find shows Flink users specifying the full type
> information at compile time, but there's no way this will scale.
>
> It's possible for us to lookup primitive type details at runtime from
> JSON, but I'll still need a way to process that JSON in Flink to extract
> the metadata if it's required. So, that brings me back to the original
> issue.
>
> How can I do this in Flink?
>
> --
> Devin G. Bost

Reply | Threaded
Open this post in threaded view
|

Re: Is Flink able to parse strings into dynamic JSON?

Chesnay Schepler
In reply to this post by devinbost
Flink needs to know upfront what kind of types it deals with to setup the serialization stack between operators.

As such, generally speaking, you will have to use some generic container for transmitting data (e.g., a String or a Jackson ObjectNode) and either work on them directly or
map them as required to specific type within the scope of a single function based on some custom logic.

There may be other approaches, but we'd have to know more about the specific use-case and requirements to help you there (e.g., what does your user interact with).
My understanding is that you have some single source for all these events, and now you want some user to define a pipeline processing a specific subset of these events?

On 1/28/2021 5:44 AM, Devin Bost wrote:
I'm wanting to know if it's possible in Flink to parse strings into a dynamic JSON object that doesn't require me to know the primitive type details at compile time. 
We have over 300 event types to process, and I need a way to load the types at runtime. I only need to know if certain fields exist on the incoming objects, and the object schemas are all different except for certain fields. 
Every example I can find shows Flink users specifying the full type information at compile time, but there's no way this will scale. 

It's possible for us to lookup primitive type details at runtime from JSON, but I'll still need a way to process that JSON in Flink to extract the metadata if it's required. So, that brings me back to the original issue. 

How can I do this in Flink? 

--
Devin G. Bost