(DEPRECATED) Apache Flink User Mailing List archive.

Flink SQL on JSON data without schema

Classic

List

Threaded

4 messages Options

Nihat Hosgur

Flink SQL on JSON data without schema

Hi there,

We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,

Nihat

Thanks,
Nihat

Fabian Hueske-2

Re: Flink SQL on JSON data without schema

Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.

You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:

Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

Nihat Hosgur

Re: Flink SQL on JSON data without schema

Hi Fabian,

I just want to make sure there is no misunderstanding. So what I've understood from your response is that regardless table source is KafkaTable or not we need to provide static schema.

Best,

Nihat

On Thu, Jan 19, 2017 at 2:50 AM Fabian Hueske <[hidden email]> wrote:

Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.
You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

Thanks,
Nihat

Fabian Hueske-2

Re: Flink SQL on JSON data without schema

The first level must be static, but a field can hold a complex object with nested data.

So you could have a schema with a single JSON object as field.

Best, Fabian

2017-01-19 19:31 GMT+01:00 Nihat Hosgur <[hidden email]>:

Hi Fabian,
I just want to make sure there is no misunderstanding. So what I've understood from your response is that regardless table source is KafkaTable or not we need to provide static schema.
Best,
Nihat

On Thu, Jan 19, 2017 at 2:50 AM Fabian Hueske <[hidden email]> wrote:
Hi Nihat,

at the current state, Flink's SQL and Table APIs require a static schema.
You could use an JSON object as value and implement scalar functions to extract fields, but that would not be very usable.

Best, Fabian

2017-01-19 2:59 GMT+01:00 Nihat Hosgur <[hidden email]>:
Hi there,
We are evaluating flink SQL to understand if it would be a better fit instead of Spark. So far we loved how natural it is to consume streams on Flink.

We do read bunch of Kafka topics and like to join those streams and eventually run some SQL queries. We've used Kafka tables yet if I'm not mistaken we must provide the schema up front. However on many use cases of ours we don't have schema information available up front to us. With spark we didn't have to have the schema. I wonder if it is doable with Flink if so any reference or even some sample code would be appreciated.

Thanks,
Nihat
--
Thanks,
Nihat

--
Thanks,
Nihat