Using Flink to analyze GDELT

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Flink to analyze GDELT

Tamara Mendt
Hello!

I was wondering if anyone has tried to use Flink to perform analysis on the GDELT (http://gdeltproject.org/). This database is a structured (csv) repository of global events. It contains about 100 GB of data (aprox. 250M events, 50 attributes for each event) and is updated with new events every day. 

I am a bit concerned that since this is a structured database that is not too big Flink may not be the ideal tool to work with it. Any insight? 

Thanks!

--
Tamara Mendt
Reply | Threaded
Open this post in threaded view
|

Re: Using Flink to analyze GDELT

Kostas Tzoumas
Hi Tamara!

I have not used GDELT, looks pretty cool!

You can certainly use Flink to analyze structured csv files, and people have worked with larger, as well as with smaller datasets using Flink.

So, you can certainly give Flink a spin. Whether Flink is the ideal tool also depends on what kind of analysis you want to run on this data. Posting some more details about your jobs would be helpful.

Kostas


On Fri, Nov 7, 2014 at 10:46 AM, Tamara Mendt <[hidden email]> wrote:
Hello!

I was wondering if anyone has tried to use Flink to perform analysis on the GDELT (http://gdeltproject.org/). This database is a structured (csv) repository of global events. It contains about 100 GB of data (aprox. 250M events, 50 attributes for each event) and is updated with new events every day. 

I am a bit concerned that since this is a structured database that is not too big Flink may not be the ideal tool to work with it. Any insight? 

Thanks!

--
Tamara Mendt