Hi to all,
i am working on a project with Gelly and i need to create a graph with billions of nodes. Although i have the edge list, the node in the Graph needs to be a POJO object, the construction of which takes long time in order to finally create the final graph. Is it possible to store the Graph object as a file and retrieve it whenever i want to run an experiment? Thanks, Stefanos |
Hi Stefane, let me know if I understand the problem correctly. The vertex values are POJOs that you're somehow inferring from the edge list and this value creation is what takes a lot of time? Since a graph is just a set of 2 datasets (vertices and edges), you could store the values to disk and have a custom input format to read them into datasets. Would that work for you? -Vasia. On 25 November 2015 at 15:09, Stefanos Antaris <[hidden email]> wrote: Hi to all, |
Hi Vasia,
my graph object is the following: Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(edgeList.collect(), env); The vertex is a POJO not the value. So the problem is how could i store and retrieve the vertex list? Thanks, Stefanos
|
Hey, you can preprocess your data, create the vertices and store them to a file, like you would store any other Flink DataSet, e.g. with writeAsText. Then, you can create the graph by reading 2 datasets, like this: DataSet<Vertex> vertices = env.readTextFile("/path/to/vertices/")... // or your custom reading logic DataSet<Edge> edges = ... Graph graph = Graph.fromDataSet(vertices, edges, env); Is this what you're looking for? Also, note that if you have a very large graph, you should avoid using collect() and fromCollection(). -Vasia. On 25 November 2015 at 18:03, Stefanos Antaris <[hidden email]> wrote:
|
Hi,
It works fine using this approach. Thanks, Stefanos
|
Good to know :) On 25 November 2015 at 21:44, Stefanos Antaris <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |