I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this:
"objectives": {
"items": [
{
"id": 1,
"label": "test 1"
"size": 1000.0
},
{
"id": 2,
"label": "test 2"
"size": 500.0
}
],
"threshold": 10.0,
"threshold_period": "hourly",
"max_ms": 30000.0
}
Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences.
For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following:
CREATE TABLE probes (
`objectives` ROW(`threshold` FLOAT, `threshold_period` STRING)
...
) WITH (
...
'format' = 'debezium-json',
) 'debezium-json.schema-include' = 'true',
'debezium-json.ignore-parse-errors' = 'true'However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors`
option.
Does JSON parsing need to match 100% the schema of the field or is it lenient?
Is there any option or syntactic detail I'm missing?
|
Hi Sebastian,
Did you try setting debezium-json-map-null-key-mode to DROP [1]? I'm also pulling in Timo who might know better. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode Regards, Roman On Fri, Mar 12, 2021 at 2:42 PM Magri, Sebastian <[hidden email]> wrote: > > I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this: > > > "objectives": { > "items": [ > { > "id": 1, > "label": "test 1" > "size": 1000.0 > }, > { > "id": 2, > "label": "test 2" > "size": 500.0 > } > ], > "threshold": 10.0, > "threshold_period": "hourly", > "max_ms": 30000.0 > } > > > Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences. > > For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following: > > > CREATE TABLE probes ( > `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING) > ... > ) WITH ( > ... > 'format' = 'debezium-json', > 'debezium-json.schema-include' = 'true', > 'debezium-json.ignore-parse-errors' = 'true' > ) > > > However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors` option. > > Does JSON parsing need to match 100% the schema of the field or is it lenient? > > Is there any option or syntactic detail I'm missing? > > Best Regards, |
Hi Roman!
Seems like that option is no longer available.
Best Regards,
Sebastian From: Roman Khachatryan <[hidden email]>
Sent: Friday, March 12, 2021 16:59 To: Magri, Sebastian <[hidden email]>; Timo Walther <[hidden email]> Cc: user <[hidden email]> Subject: Re: [Flink SQL] Leniency of JSON parsing Hi Sebastian,
Did you try setting debezium-json-map-null-key-mode to DROP [1]? I'm also pulling in Timo who might know better. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode Regards, Roman On Fri, Mar 12, 2021 at 2:42 PM Magri, Sebastian <[hidden email]> wrote: > > I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this: > > > "objectives": { > "items": [ > { > "id": 1, > "label": "test 1" > "size": 1000.0 > }, > { > "id": 2, > "label": "test 2" > "size": 500.0 > } > ], > "threshold": 10.0, > "threshold_period": "hourly", > "max_ms": 30000.0 > } > > > Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences. > > For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following: > > > CREATE TABLE probes ( > `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING) > ... > ) WITH ( > ... > 'format' = 'debezium-json', > 'debezium-json.schema-include' = 'true', > 'debezium-json.ignore-parse-errors' = 'true' > ) > > > However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors` option. > > Does JSON parsing need to match 100% the schema of the field or is it lenient? > > Is there any option or syntactic detail I'm missing? > > Best Regards, |
I validated it's still accepted by the connector but it's not in the documentation anymore.
It doesn't seem to help in my case.
Thanks,
Sebastian From: Magri, Sebastian <[hidden email]>
Sent: Friday, March 12, 2021 18:50 To: Timo Walther <[hidden email]>; [hidden email] <[hidden email]> Cc: user <[hidden email]> Subject: Re: [Flink SQL] Leniency of JSON parsing
Hi Roman!
Seems like that option is no longer available.
Best Regards,
Sebastian From: Roman Khachatryan <[hidden email]>
Sent: Friday, March 12, 2021 16:59 To: Magri, Sebastian <[hidden email]>; Timo Walther <[hidden email]> Cc: user <[hidden email]> Subject: Re: [Flink SQL] Leniency of JSON parsing Hi Sebastian,
Did you try setting debezium-json-map-null-key-mode to DROP [1]? I'm also pulling in Timo who might know better. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode Regards, Roman On Fri, Mar 12, 2021 at 2:42 PM Magri, Sebastian <[hidden email]> wrote: > > I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this: > > > "objectives": { > "items": [ > { > "id": 1, > "label": "test 1" > "size": 1000.0 > }, > { > "id": 2, > "label": "test 2" > "size": 500.0 > } > ], > "threshold": 10.0, > "threshold_period": "hourly", > "max_ms": 30000.0 > } > > > Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences. > > For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following: > > > CREATE TABLE probes ( > `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING) > ... > ) WITH ( > ... > 'format' = 'debezium-json', > 'debezium-json.schema-include' = 'true', > 'debezium-json.ignore-parse-errors' = 'true' > ) > > > However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors` option. > > Does JSON parsing need to match 100% the schema of the field or is it lenient? > > Is there any option or syntactic detail I'm missing? > > Best Regards, |
Hi Sebastian,
you can checkout the logic your self by looking into https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/debezium/DebeziumJsonDeserializationSchema.java and https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDataDeserializationSchema.java So actually your use case should work. Could you help investogating what is going wrong? In any case we should open an issue for it. It seems to be a bug. Regards, Timo On 12.03.21 21:10, Magri, Sebastian wrote: > I validated it's still accepted by the connector but it's not in the > documentation anymore. > > It doesn't seem to help in my case. > > Thanks, > Sebastian > ------------------------------------------------------------------------ > *From:* Magri, Sebastian <[hidden email]> > *Sent:* Friday, March 12, 2021 18:50 > *To:* Timo Walther <[hidden email]>; [hidden email] <[hidden email]> > *Cc:* user <[hidden email]> > *Subject:* Re: [Flink SQL] Leniency of JSON parsing > Hi Roman! > > Seems like that option is no longer available. > > Best Regards, > Sebastian > ------------------------------------------------------------------------ > *From:* Roman Khachatryan <[hidden email]> > *Sent:* Friday, March 12, 2021 16:59 > *To:* Magri, Sebastian <[hidden email]>; Timo Walther > <[hidden email]> > *Cc:* user <[hidden email]> > *Subject:* Re: [Flink SQL] Leniency of JSON parsing > Hi Sebastian, > > Did you try setting debezium-json-map-null-key-mode to DROP [1]? > > I'm also pulling in Timo who might know better. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode > <https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode> > > Regards, > Roman > > > > On Fri, Mar 12, 2021 at 2:42 PM Magri, Sebastian > <[hidden email]> wrote: >> >> I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this: >> >> >> "objectives": { >> "items": [ >> { >> "id": 1, >> "label": "test 1" >> "size": 1000.0 >> }, >> { >> "id": 2, >> "label": "test 2" >> "size": 500.0 >> } >> ], >> "threshold": 10.0, >> "threshold_period": "hourly", >> "max_ms": 30000.0 >> } >> >> >> Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences. >> >> For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following: >> >> >> CREATE TABLE probes ( >> `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING) >> ... >> ) WITH ( >> ... >> 'format' = 'debezium-json', >> 'debezium-json.schema-include' = 'true', >> 'debezium-json.ignore-parse-errors' = 'true' >> ) >> >> >> However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors` option. >> >> Does JSON parsing need to match 100% the schema of the field or is it lenient? >> >> Is there any option or syntactic detail I'm missing? >> >> Best Regards, |
Thanks a lot Timo,
I will check those links out and create an issue with more information.
Best Regards,
Sebastian From: Timo Walther <[hidden email]>
Sent: Tuesday, March 16, 2021 15:29 To: Magri, Sebastian <[hidden email]>; [hidden email] <[hidden email]> Cc: user <[hidden email]> Subject: Re: [Flink SQL] Leniency of JSON parsing Hi Sebastian,
you can checkout the logic your self by looking into https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/debezium/DebeziumJsonDeserializationSchema.java and https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDataDeserializationSchema.java So actually your use case should work. Could you help investogating what is going wrong? In any case we should open an issue for it. It seems to be a bug. Regards, Timo On 12.03.21 21:10, Magri, Sebastian wrote: > I validated it's still accepted by the connector but it's not in the > documentation anymore. > > It doesn't seem to help in my case. > > Thanks, > Sebastian > ------------------------------------------------------------------------ > *From:* Magri, Sebastian <[hidden email]> > *Sent:* Friday, March 12, 2021 18:50 > *To:* Timo Walther <[hidden email]>; [hidden email] <[hidden email]> > *Cc:* user <[hidden email]> > *Subject:* Re: [Flink SQL] Leniency of JSON parsing > Hi Roman! > > Seems like that option is no longer available. > > Best Regards, > Sebastian > ------------------------------------------------------------------------ > *From:* Roman Khachatryan <[hidden email]> > *Sent:* Friday, March 12, 2021 16:59 > *To:* Magri, Sebastian <[hidden email]>; Timo Walther > <[hidden email]> > *Cc:* user <[hidden email]> > *Subject:* Re: [Flink SQL] Leniency of JSON parsing > Hi Sebastian, > > Did you try setting debezium-json-map-null-key-mode to DROP [1]? > > I'm also pulling in Timo who might know better. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode > <https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html#debezium-json-map-null-key-mode> > > Regards, > Roman > > > > On Fri, Mar 12, 2021 at 2:42 PM Magri, Sebastian > <[hidden email]> wrote: >> >> I'm trying to extract data from a Debezium CDC source, in which one of the backing tables has an open schema nested JSON field like this: >> >> >> "objectives": { >> "items": [ >> { >> "id": 1, >> "label": "test 1" >> "size": 1000.0 >> }, >> { >> "id": 2, >> "label": "test 2" >> "size": 500.0 >> } >> ], >> "threshold": 10.0, >> "threshold_period": "hourly", >> "max_ms": 30000.0 >> } >> >> >> Any of these fields can be missing at any time, and there can also be additional, different fields. It is guaranteed that a field will have the same data type for all occurrences. >> >> For now, I really need to get only the `threshold` and `threshold_period` fields. For which I'm using a field as the following: >> >> >> CREATE TABLE probes ( >> `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING) >> ... >> ) WITH ( >> ... >> 'format' = 'debezium-json', >> 'debezium-json.schema-include' = 'true', >> 'debezium-json.ignore-parse-errors' = 'true' >> ) >> >> >> However I keep getting `NULL` values in my `objectives` column, or corrupt JSON message exceptions when I disable the `ignore-parse-errors` option. >> >> Does JSON parsing need to match 100% the schema of the field or is it lenient? >> >> Is there any option or syntactic detail I'm missing? >> >> Best Regards, |
Free forum by Nabble | Edit this page |