JSON data source for Flink Job

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

JSON data source for Flink Job

Tamara Mendt
Hello,

I have a JSON file containing multiple JSON objects and wish to use this as a data source for a Flink Job.

What is the best way to do this?

Cheers,

Tamara
Reply | Threaded
Open this post in threaded view
|

Re: JSON data source for Flink Job

Stephan Ewen
Hi!

This depends a bit how the JSON is formatted.

If you want the source to be parallelizable, you need to have a way of splitting the file at object boundaries. Is there a character on which you can split? If yes, you can use theTextInputFormat (with a custom line break character), take the strings and parse them to JSON with your favorite library (like Jackson or so).

Stephan


On Thu, May 28, 2015 at 12:24 PM, Tamara Mendt <[hidden email]> wrote:
Hello,

I have a JSON file containing multiple JSON objects and wish to use this as a data source for a Flink Job.

What is the best way to do this?

Cheers,

Tamara

Reply | Threaded
Open this post in threaded view
|

Re: JSON data source for Flink Job

Tamara Mendt
Ok great, thanks a lot =)



On Thu, May 28, 2015 at 12:39 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

This depends a bit how the JSON is formatted.

If you want the source to be parallelizable, you need to have a way of splitting the file at object boundaries. Is there a character on which you can split? If yes, you can use theTextInputFormat (with a custom line break character), take the strings and parse them to JSON with your favorite library (like Jackson or so).

Stephan


On Thu, May 28, 2015 at 12:24 PM, Tamara Mendt <[hidden email]> wrote:
Hello,

I have a JSON file containing multiple JSON objects and wish to use this as a data source for a Flink Job.

What is the best way to do this?

Cheers,

Tamara




--
Tamara Mendt