Flink SQL on JSON data without schema

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink SQL on JSON data without schema

Nihat Hosgur
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL on JSON data without schema

Fabian Hueske-2
Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.
You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL on JSON data without schema

Nihat Hosgur
Hi Fabian,
I just want to make sure there is no misunderstanding. So what I've understood from your response is that regardless table source is KafkaTable or not we need to provide static schema.
Best,
Nihat


On Thu, Jan 19, 2017 at 2:50 AM Fabian Hueske <[hidden email]> wrote:
Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.
You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

--
Thanks,
Nihat
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL on JSON data without schema

Fabian Hueske-2
The first level must be static, but a field can hold a complex object with nested data.
So you could have a schema with a single JSON object as field.

Best, Fabian

2017-01-19 19:31 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi Fabian,
I just want to make sure there is no misunderstanding. So what I've understood from your response is that regardless table source is KafkaTable or not we need to provide static schema.
Best,
Nihat


On Thu, Jan 19, 2017 at 2:50 AM Fabian Hueske <[hidden email]> wrote:
Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.
You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

--
Thanks,
Nihat