LDBC Graph Data into Flink

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

LDBC Graph Data into Flink

Martin Junghanns-2
Hi all,

For our benchmarks with Flink, we are using a data generator provided by
the LDBC project (Linked Data Benchmark Council) [1][2]. The generator
uses MapReduce to create directed, labeled, attributed graphs that mimic
properties of real online social networks (e.g, degree distribution,
diameter). The output is stored in several files either local or in
HDFS. Each file represents a vertex, edge or multi-valued property class.

I wrote a little tool, that parses and transforms the LDBC output into
two datasets representing vertices and edges. Each vertex has a unique
id, a label and payload according to the LDBC schema. Each edge has a
unique id, a label, source and target vertex IDs and also payload
according to the schema.

I thought this may be useful for others so I put it on GitHub [2]. It
currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in
there.

Best,
Martin

[1] http://ldbcouncil.org/
[2] https://github.com/ldbc/ldbc_snb_datagen
[3] https://github.com/s1ck/ldbc-flink-import
Reply | Threaded
Open this post in threaded view
|

Re: LDBC Graph Data into Flink

Vasiliki Kalavri
Hi Martin,

thanks a lot for sharing! This is a very useful tool.
I only had a quick look, but if we merge label and payload inside a Tuple2, then it should also be Gelly-compatible :)

Cheers,
Vasia.

On 6 October 2015 at 10:03, Martin Junghanns <[hidden email]> wrote:
Hi all,

For our benchmarks with Flink, we are using a data generator provided by the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses MapReduce to create directed, labeled, attributed graphs that mimic properties of real online social networks (e.g, degree distribution, diameter). The output is stored in several files either local or in HDFS. Each file represents a vertex, edge or multi-valued property class.

I wrote a little tool, that parses and transforms the LDBC output into two datasets representing vertices and edges. Each vertex has a unique id, a label and payload according to the LDBC schema. Each edge has a unique id, a label, source and target vertex IDs and also payload according to the schema.

I thought this may be useful for others so I put it on GitHub [2]. It currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in there.

Best,
Martin

[1] http://ldbcouncil.org/
[2] https://github.com/ldbc/ldbc_snb_datagen
[3] https://github.com/s1ck/ldbc-flink-import

Reply | Threaded
Open this post in threaded view
|

Re: LDBC Graph Data into Flink

Martin Junghanns-2
Hi Vasia,

No problem. Sure, Gelly is just a map() call away :)

Best,
Martin

On 06.10.2015 10:53, Vasiliki Kalavri wrote:

> Hi Martin,
>
> thanks a lot for sharing! This is a very useful tool.
> I only had a quick look, but if we merge label and payload inside a Tuple2,
> then it should also be Gelly-compatible :)
>
> Cheers,
> Vasia.
>
> On 6 October 2015 at 10:03, Martin Junghanns <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> For our benchmarks with Flink, we are using a data generator provided by
>> the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses
>> MapReduce to create directed, labeled, attributed graphs that mimic
>> properties of real online social networks (e.g, degree distribution,
>> diameter). The output is stored in several files either local or in HDFS.
>> Each file represents a vertex, edge or multi-valued property class.
>>
>> I wrote a little tool, that parses and transforms the LDBC output into two
>> datasets representing vertices and edges. Each vertex has a unique id, a
>> label and payload according to the LDBC schema. Each edge has a unique id,
>> a label, source and target vertex IDs and also payload according to the
>> schema.
>>
>> I thought this may be useful for others so I put it on GitHub [2]. It
>> currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in
>> there.
>>
>> Best,
>> Martin
>>
>> [1] http://ldbcouncil.org/
>> [2] https://github.com/ldbc/ldbc_snb_datagen
>> [3] https://github.com/s1ck/ldbc-flink-import
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: LDBC Graph Data into Flink

Martin Junghanns-2
Hi,

I wrote a short blog post about the ldbc-flink tool including a short
overview of Flink and a Gelly example.

http://ldbcouncil.org/blog/ldbc-and-apache-flink

Best,
Martin

On 06.10.2015 11:00, Martin Junghanns wrote:

> Hi Vasia,
>
> No problem. Sure, Gelly is just a map() call away :)
>
> Best,
> Martin
>
> On 06.10.2015 10:53, Vasiliki Kalavri wrote:
>> Hi Martin,
>>
>> thanks a lot for sharing! This is a very useful tool.
>> I only had a quick look, but if we merge label and payload inside a Tuple2,
>> then it should also be Gelly-compatible :)
>>
>> Cheers,
>> Vasia.
>>
>> On 6 October 2015 at 10:03, Martin Junghanns <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>> For our benchmarks with Flink, we are using a data generator provided by
>>> the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses
>>> MapReduce to create directed, labeled, attributed graphs that mimic
>>> properties of real online social networks (e.g, degree distribution,
>>> diameter). The output is stored in several files either local or in HDFS.
>>> Each file represents a vertex, edge or multi-valued property class.
>>>
>>> I wrote a little tool, that parses and transforms the LDBC output into two
>>> datasets representing vertices and edges. Each vertex has a unique id, a
>>> label and payload according to the LDBC schema. Each edge has a unique id,
>>> a label, source and target vertex IDs and also payload according to the
>>> schema.
>>>
>>> I thought this may be useful for others so I put it on GitHub [2]. It
>>> currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in
>>> there.
>>>
>>> Best,
>>> Martin
>>>
>>> [1] http://ldbcouncil.org/
>>> [2] https://github.com/ldbc/ldbc_snb_datagen
>>> [3] https://github.com/s1ck/ldbc-flink-import
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: LDBC Graph Data into Flink

Vasiliki Kalavri
Great, thanks for sharing Martin!

On 24 November 2015 at 15:00, Martin Junghanns <[hidden email]> wrote:
Hi,

I wrote a short blog post about the ldbc-flink tool including a short
overview of Flink and a Gelly example.

http://ldbcouncil.org/blog/ldbc-and-apache-flink

Best,
Martin

On 06.10.2015 11:00, Martin Junghanns wrote:
> Hi Vasia,
>
> No problem. Sure, Gelly is just a map() call away :)
>
> Best,
> Martin
>
> On 06.10.2015 10:53, Vasiliki Kalavri wrote:
>> Hi Martin,
>>
>> thanks a lot for sharing! This is a very useful tool.
>> I only had a quick look, but if we merge label and payload inside a Tuple2,
>> then it should also be Gelly-compatible :)
>>
>> Cheers,
>> Vasia.
>>
>> On 6 October 2015 at 10:03, Martin Junghanns <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>> For our benchmarks with Flink, we are using a data generator provided by
>>> the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses
>>> MapReduce to create directed, labeled, attributed graphs that mimic
>>> properties of real online social networks (e.g, degree distribution,
>>> diameter). The output is stored in several files either local or in HDFS.
>>> Each file represents a vertex, edge or multi-valued property class.
>>>
>>> I wrote a little tool, that parses and transforms the LDBC output into two
>>> datasets representing vertices and edges. Each vertex has a unique id, a
>>> label and payload according to the LDBC schema. Each edge has a unique id,
>>> a label, source and target vertex IDs and also payload according to the
>>> schema.
>>>
>>> I thought this may be useful for others so I put it on GitHub [2]. It
>>> currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in
>>> there.
>>>
>>> Best,
>>> Martin
>>>
>>> [1] http://ldbcouncil.org/
>>> [2] https://github.com/ldbc/ldbc_snb_datagen
>>> [3] https://github.com/s1ck/ldbc-flink-import
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: LDBC Graph Data into Flink

Till Rohrmann
Nice blog post Martin!

On Tue, Nov 24, 2015 at 3:14 PM, Vasiliki Kalavri <[hidden email]> wrote:
Great, thanks for sharing Martin!

On 24 November 2015 at 15:00, Martin Junghanns <[hidden email]> wrote:
Hi,

I wrote a short blog post about the ldbc-flink tool including a short
overview of Flink and a Gelly example.

http://ldbcouncil.org/blog/ldbc-and-apache-flink

Best,
Martin

On 06.10.2015 11:00, Martin Junghanns wrote:
> Hi Vasia,
>
> No problem. Sure, Gelly is just a map() call away :)
>
> Best,
> Martin
>
> On 06.10.2015 10:53, Vasiliki Kalavri wrote:
>> Hi Martin,
>>
>> thanks a lot for sharing! This is a very useful tool.
>> I only had a quick look, but if we merge label and payload inside a Tuple2,
>> then it should also be Gelly-compatible :)
>>
>> Cheers,
>> Vasia.
>>
>> On 6 October 2015 at 10:03, Martin Junghanns <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>> For our benchmarks with Flink, we are using a data generator provided by
>>> the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses
>>> MapReduce to create directed, labeled, attributed graphs that mimic
>>> properties of real online social networks (e.g, degree distribution,
>>> diameter). The output is stored in several files either local or in HDFS.
>>> Each file represents a vertex, edge or multi-valued property class.
>>>
>>> I wrote a little tool, that parses and transforms the LDBC output into two
>>> datasets representing vertices and edges. Each vertex has a unique id, a
>>> label and payload according to the LDBC schema. Each edge has a unique id,
>>> a label, source and target vertex IDs and also payload according to the
>>> schema.
>>>
>>> I thought this may be useful for others so I put it on GitHub [2]. It
>>> currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in
>>> there.
>>>
>>> Best,
>>> Martin
>>>
>>> [1] http://ldbcouncil.org/
>>> [2] https://github.com/ldbc/ldbc_snb_datagen
>>> [3] https://github.com/s1ck/ldbc-flink-import
>>>
>>