Specifying Schema dynamically

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Specifying Schema dynamically

Luqman Ghani
Hi,

I hope everyone is doing well.

I have a use case where we infer schema according to file headers and other information. Now, in Flink, we can specify schema of a stream with case classes and tuples. With tuples, we cannot give names to fields, but we will have to generate case classes on the fly if we use them. Is there any way of specifying schema with a Map[String,Any] to Flink, so it can infer schema from this map.

Like if a file has a header: id, first_name, last_name, last_login
and we infer schema as: Int, String, String, Long

Can we specify it as Map[String, Any]("id" -> Int, "first_name" -> String, "last_name" -> String, "last_login" -> Long)

We want to use keyBy with field names instead of their indices. I hope there is a way :)

I was looking into dynamically create case classes in scala using scala-reflect, but I'm facing problems in getting that class that forwarding it to Flink program.

Thanks,
Luqman
Reply | Threaded
Open this post in threaded view
|

Re: Specifying Schema dynamically

Tzu-Li (Gordon) Tai
Hi Luqman,

From your description, it seems like that you want to infer the type (case class, tuple, etc.) of a stream dynamically at runtime.
AFAIK, I don’t think this is supported in Flink. You’re required to have defined types for your DataStreams.

Could you also provide an example code of what the functionality you have in mind looks like?
That would help clarify if I have misunderstood and there’s actually a way to do it.

- Gordon

On February 12, 2017 at 4:30:56 PM, Luqman Ghani ([hidden email]) wrote:

Like if a file has a header: id, first_name, last_name, last_login
and we infer schema as: Int, String, String, Long

Reply | Threaded
Open this post in threaded view
|

Re: Specifying Schema dynamically

Luqman Ghani
Hi,

My case is very similar to what is described in this link of Spark:

I hope this clarifies it.

Thanks,
Luqman

On Mon, Feb 13, 2017 at 12:04 PM, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi Luqman,

From your description, it seems like that you want to infer the type (case class, tuple, etc.) of a stream dynamically at runtime.
AFAIK, I don’t think this is supported in Flink. You’re required to have defined types for your DataStreams.

Could you also provide an example code of what the functionality you have in mind looks like?
That would help clarify if I have misunderstood and there’s actually a way to do it.

- Gordon

On February 12, 2017 at 4:30:56 PM, Luqman Ghani ([hidden email]) wrote:

Like if a file has a header: id, first_name, last_name, last_login
and we infer schema as: Int, String, String, Long