store and retrieve Graph object

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

store and retrieve Graph object

Stefanos Antaris
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos
Reply | Threaded
Open this post in threaded view
|

Re: store and retrieve Graph object

Vasiliki Kalavri
Hi Stefane,

let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you?

-Vasia.

On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote:
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos

Reply | Threaded
Open this post in threaded view
|

Re: store and retrieve Graph object

Stefanos Antaris
Hi Vasia,

my graph object is the following: 

Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(edgeList.collect(), env);

The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? 

Thanks,
Stefanos

On 25 Nov 2015, at 18:16, Vasiliki Kalavri <[hidden email]> wrote:

Hi Stefane,

let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you?

-Vasia.

On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote:
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos


Reply | Threaded
Open this post in threaded view
|

Re: store and retrieve Graph object

Vasiliki Kalavri
Hey,

you can preprocess your data, create the vertices and store them to a file, like you would store any other Flink DataSet, e.g. with writeAsText.

Then, you can create the graph by reading 2 datasets, like this:

DataSet<Vertex> vertices = env.readTextFile("/path/to/vertices/")... // or your custom reading logic
DataSet<Edge> edges = ...

Graph graph = Graph.fromDataSet(vertices, edges, env);

Is this what you're looking for?

Also, note that if you have a very large graph, you should avoid using collect() and fromCollection().

-Vasia.

On 25 November 2015 at 18:03, Stefanos Antaris <[hidden email]> wrote:
Hi Vasia,

my graph object is the following: 

Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(edgeList.collect(), env);

The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? 

Thanks,
Stefanos

On 25 Nov 2015, at 18:16, Vasiliki Kalavri <[hidden email]> wrote:

Hi Stefane,

let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you?

-Vasia.

On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote:
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos



Reply | Threaded
Open this post in threaded view
|

Re: store and retrieve Graph object

Stefanos Antaris
Hi,

It works fine using this approach. 

Thanks,
Stefanos

On 25 Nov 2015, at 20:32, Vasiliki Kalavri <[hidden email]> wrote:

Hey,

you can preprocess your data, create the vertices and store them to a file, like you would store any other Flink DataSet, e.g. with writeAsText.

Then, you can create the graph by reading 2 datasets, like this:

DataSet<Vertex> vertices = env.readTextFile("/path/to/vertices/")... // or your custom reading logic
DataSet<Edge> edges = ...

Graph graph = Graph.fromDataSet(vertices, edges, env);

Is this what you're looking for?

Also, note that if you have a very large graph, you should avoid using collect() and fromCollection().

-Vasia.

On 25 November 2015 at 18:03, Stefanos Antaris <[hidden email]> wrote:
Hi Vasia,

my graph object is the following: 

Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(edgeList.collect(), env);

The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? 

Thanks,
Stefanos

On 25 Nov 2015, at 18:16, Vasiliki Kalavri <[hidden email]> wrote:

Hi Stefane,

let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you?

-Vasia.

On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote:
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos




Reply | Threaded
Open this post in threaded view
|

Re: store and retrieve Graph object

Vasiliki Kalavri
Good to know :)

On 25 November 2015 at 21:44, Stefanos Antaris <[hidden email]> wrote:
Hi,

It works fine using this approach. 

Thanks,
Stefanos

On 25 Nov 2015, at 20:32, Vasiliki Kalavri <[hidden email]> wrote:

Hey,

you can preprocess your data, create the vertices and store them to a file, like you would store any other Flink DataSet, e.g. with writeAsText.

Then, you can create the graph by reading 2 datasets, like this:

DataSet<Vertex> vertices = env.readTextFile("/path/to/vertices/")... // or your custom reading logic
DataSet<Edge> edges = ...

Graph graph = Graph.fromDataSet(vertices, edges, env);

Is this what you're looking for?

Also, note that if you have a very large graph, you should avoid using collect() and fromCollection().

-Vasia.

On 25 November 2015 at 18:03, Stefanos Antaris <[hidden email]> wrote:
Hi Vasia,

my graph object is the following: 

Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(edgeList.collect(), env);

The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? 

Thanks,
Stefanos

On 25 Nov 2015, at 18:16, Vasiliki Kalavri <[hidden email]> wrote:

Hi Stefane,

let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you?

-Vasia.

On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote:
Hi to all,

i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment?

Thanks,
Stefanos