Json filter

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Json filter

Luca Ferrari
Hello,

I’d like to apply a filter on an attribute of a json documents. For example I have a Document like that 
{
    "time": "2014-08-15T21:00:00.000",
    "longitude": "136.048724",
    "latitude": "33.8144776",
    "altitude": "NULL",
    "rainfall": 0,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:40:00.000",
    "longitude": "139.0634281",
    "latitude": "36.3894816",
    "altitude": "NULL",
    "rainfall": 6.5,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:50:00.000",
    "longitude": "138.4768306",
    "latitude": "36.2488683",
    "altitude": "NULL",
    "rainfall": 8,
    "city_name": "Tokyo",
    "station_name": "NULL"
}
And I need to filter the data with rainfall > 6.

How can I manage the JSON parser and the filter?

Thanks
Luca
Reply | Threaded
Open this post in threaded view
|

Re: Json filter

Fabian Hueske-2
Hi Luca,

parsing JSON can be tricky if your schema is nested.
In case of a flat schema (as yours), you can read the JSON records like this:

ExecutionEnvironment env = ...
DataSet<String> jsonRaw = env.readFileOfPrimitives(path, "},", String.class); // "}," is a sequence that uniquely delimits your records

This will give you a raw JSON String for each record (excluding the closing curly brace). You can implement a Map function that uses any JSON parser (e.g., Apache Jackson) to convert this record into a Java object and a Filter function that operates on the object.

Let me know, if you have further questions.

Cheers, Fabian

2015-07-07 11:18 GMT+02:00 Luca Ferrari <[hidden email]>:
Hello,

I’d like to apply a filter on an attribute of a json documents. For example I have a Document like that 
{
    "time": "2014-08-15T21:00:00.000",
    "longitude": "136.048724",
    "latitude": "33.8144776",
    "altitude": "NULL",
    "rainfall": 0,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:40:00.000",
    "longitude": "139.0634281",
    "latitude": "36.3894816",
    "altitude": "NULL",
    "rainfall": 6.5,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:50:00.000",
    "longitude": "138.4768306",
    "latitude": "36.2488683",
    "altitude": "NULL",
    "rainfall": 8,
    "city_name": "Tokyo",
    "station_name": "NULL"
}
And I need to filter the data with rainfall > 6.

How can I manage the JSON parser and the filter?

Thanks
Luca

Reply | Threaded
Open this post in threaded view
|

Re: Json filter

Luca Ferrari
Thank you very much. It has been very useful.

Cheers
Luca

Da: Fabian Hueske
Risposta: <[hidden email]>
Data: martedì 7 luglio 2015 11:36
A: <[hidden email]>
Oggetto: Re: Json filter

Hi Luca,

parsing JSON can be tricky if your schema is nested.
In case of a flat schema (as yours), you can read the JSON records like this:

ExecutionEnvironment env = ...
DataSet<String> jsonRaw = env.readFileOfPrimitives(path, "},", String.class); // "}," is a sequence that uniquely delimits your records

This will give you a raw JSON String for each record (excluding the closing curly brace). You can implement a Map function that uses any JSON parser (e.g., Apache Jackson) to convert this record into a Java object and a Filter function that operates on the object.

Let me know, if you have further questions.

Cheers, Fabian

2015-07-07 11:18 GMT+02:00 Luca Ferrari <[hidden email]>:
Hello,

I’d like to apply a filter on an attribute of a json documents. For example I have a Document like that 
{
    "time": "2014-08-15T21:00:00.000",
    "longitude": "136.048724",
    "latitude": "33.8144776",
    "altitude": "NULL",
    "rainfall": 0,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:40:00.000",
    "longitude": "139.0634281",
    "latitude": "36.3894816",
    "altitude": "NULL",
    "rainfall": 6.5,
    "city_name": "Tokyo",
    "station_name": "NULL"
},
{
    "time": "2014-08-15T21:50:00.000",
    "longitude": "138.4768306",
    "latitude": "36.2488683",
    "altitude": "NULL",
    "rainfall": 8,
    "city_name": "Tokyo",
    "station_name": "NULL"
}
And I need to filter the data with rainfall > 6.

How can I manage the JSON parser and the filter?

Thanks
Luca