Get nested Rows from Json string

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Get nested Rows from Json string

françois lacombe
Hi all,

I currently get a json string from my pgsql source with nested objects to be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested Rows at right indices in their parent but here is what outputs : ("b", "d", "g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this job.

Do you have any hint which can help please ?

All the best

François



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 
Reply | Threaded
Open this post in threaded view
|

Re: Get nested Rows from Json string

Rong Rong
Hi François,

I wasn't exactly sure this is a JSON object or JSON string you are trying to process.
For a JSON string this [1] article might help.
For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example here [2]

BTW, a very remote hunch, this might be just a stringify issue how you print the row out.

--
Rong


On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <[hidden email]> wrote:
Hi all,

I currently get a json string from my pgsql source with nested objects to be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested Rows at right indices in their parent but here is what outputs : ("b", "d", "g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this job.

Do you have any hint which can help please ?

All the best

François



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 
Reply | Threaded
Open this post in threaded view
|

Re: Get nested Rows from Json string

françois lacombe
Hi Rong,

Thank you for this answer.
I've changed Rows to Map, which ease the conversion process.

Nevertheless I'm interested in any explanation about why row1.setField(i, row2) appeends row2 at the end of row1.

All the best

François

Le mer. 6 févr. 2019 à 19:33, Rong Rong <[hidden email]> a écrit :
Hi François,

I wasn't exactly sure this is a JSON object or JSON string you are trying to process.
For a JSON string this [1] article might help.
For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example here [2]

BTW, a very remote hunch, this might be just a stringify issue how you print the row out.

--
Rong


On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <[hidden email]> wrote:
Hi all,

I currently get a json string from my pgsql source with nested objects to be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested Rows at right indices in their parent but here is what outputs : ("b", "d", "g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this job.

Do you have any hint which can help please ?

All the best

François



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 


      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 
Reply | Threaded
Open this post in threaded view
|

Re: Get nested Rows from Json string

Rong Rong
Hi François,

I just did some research and seems like this is in fact a Stringify issue. 
If you try running one of the AvroRowDeSerializationSchemaTest [1], you will find out that only MAP, ARRAY are correctly stringify (Map using "{}" quote and Array using "[]" quote). 
However nested records are not quoted using "()".

Wasn't sure if this is consider as a bug for the toString method of the type Row. I just filed a JIRA [2] for this issue, feel free to comment on the discussion.

--
Rong


On Fri, Feb 8, 2019 at 8:51 AM françois lacombe <[hidden email]> wrote:
Hi Rong,

Thank you for this answer.
I've changed Rows to Map, which ease the conversion process.

Nevertheless I'm interested in any explanation about why row1.setField(i, row2) appeends row2 at the end of row1.

All the best

François

Le mer. 6 févr. 2019 à 19:33, Rong Rong <[hidden email]> a écrit :
Hi François,

I wasn't exactly sure this is a JSON object or JSON string you are trying to process.
For a JSON string this [1] article might help.
For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example here [2]

BTW, a very remote hunch, this might be just a stringify issue how you print the row out.

--
Rong


On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <[hidden email]> wrote:
Hi all,

I currently get a json string from my pgsql source with nested objects to be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested Rows at right indices in their parent but here is what outputs : ("b", "d", "g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this job.

Do you have any hint which can help please ?

All the best

François



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 


      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 
Reply | Threaded
Open this post in threaded view
|

Re: Get nested Rows from Json string

françois lacombe
Hi Rong,

Thank you for JIRA.
Understood it may be solved in a next release, I'll comment the ticket in case of further input

All the best

François

Le sam. 9 févr. 2019 à 00:57, Rong Rong <[hidden email]> a écrit :
Hi François,

I just did some research and seems like this is in fact a Stringify issue. 
If you try running one of the AvroRowDeSerializationSchemaTest [1], you will find out that only MAP, ARRAY are correctly stringify (Map using "{}" quote and Array using "[]" quote). 
However nested records are not quoted using "()".

Wasn't sure if this is consider as a bug for the toString method of the type Row. I just filed a JIRA [2] for this issue, feel free to comment on the discussion.

--
Rong


On Fri, Feb 8, 2019 at 8:51 AM françois lacombe <[hidden email]> wrote:
Hi Rong,

Thank you for this answer.
I've changed Rows to Map, which ease the conversion process.

Nevertheless I'm interested in any explanation about why row1.setField(i, row2) appeends row2 at the end of row1.

All the best

François

Le mer. 6 févr. 2019 à 19:33, Rong Rong <[hidden email]> a écrit :
Hi François,

I wasn't exactly sure this is a JSON object or JSON string you are trying to process.
For a JSON string this [1] article might help.
For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example here [2]

BTW, a very remote hunch, this might be just a stringify issue how you print the row out.

--
Rong


On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <[hidden email]> wrote:
Hi all,

I currently get a json string from my pgsql source with nested objects to be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested Rows at right indices in their parent but here is what outputs : ("b", "d", "g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this job.

Do you have any hint which can help please ?

All the best

François



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 


      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 


      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire