This post was updated on .
I'm currently making a shortest path algorithm in Gelly using DataSets, here's a piece of the code i've started with:
public DataSet<Vertex<K,Long>> ShortestPathsEAT(K startingnode) { DataSet<Vertex<K,Long>> results = this.getVertices().distinct().map(new MapFunction<Vertex<K, VV>, Vertex<K, Long>>() { @Override public Vertex<K, Long> map(Vertex<K, VV> value) throws Exception { if (startingnode == value.getId()) { return new Vertex<K, Long>(value.getId(), 0L); } else { return new Vertex<K, Long>(value.getId(), Long.MAX_VALUE); } } }); return results; } For this i need the index of the starting node, now i could just pass this as a value, but i want to execute this algorithm on the first element from my DataSet, i know i can get this element with ,first(1) but that function returns a DataSet and not the actual element. I looked for a similar question and got to the following topic [1] Which kinda gave me the feeling that it's not possible at all to retrieve a single element from a set, is that correct? What would be a good way to fix this issue? [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Get-1-element-of-DataSet-td688.html |
Hi,
I'm also interested in that question/solution. For now, my workaround looks like this: > DataSet<...> .filter(... object.Id == NeededElement.Id ... ).collect().get(0) I filter the DataSet for the element I want to find, collect it into a List which then returns the first element. That's a bit ugly and probably performancewise not the best solution, but I haven't found a "Flink-ish"-way to do it either. Regards, Sebastian |
That is indeed not the nice way to do it because it will create an executionplan just to get that value, but it does work, so thnx for that
A more concrete example for what i want, In gelly you have the SingleSourceShortestPaths algorith which requires the sourceVertexId, now i want to execute a metric called the betweenness, this requires me to execute the algorithm on each of the nodes. If i need the id of every node that means i need to Collect() all those nodes as well, which means for a graph of 1000 nodes i have 1000+ execution plans |
It sounds like you want to use an all-pairs shortest paths algorithm. This would be a great contribution to Gelly! https://en.wikipedia.org/wiki/Shortest_path_problem#All-pairs_shortest_paths On Wed, Oct 26, 2016 at 9:29 AM, otherwise777 <[hidden email]> wrote: That is indeed not the nice way to do it because it will create an |
I created FLINK-4965 "AllPairsShortestPaths" and FLINK-4966 "BetweennessCentrality". On Wed, Oct 26, 2016 at 4:39 PM, Greg Hogan <[hidden email]> wrote:
|
Cool, thnx for that,
I tried searching for it in teh github but couldn't find it, do you have the url by any chance? I'm going to try to implement such an algorithm for temporal graphs |
The tickets are in Flink's Jira: Are you looking to process temporal graphs with the DataStream API?https://issues.apache.org/jira/browse/FLINK-4965 https://issues.apache.org/jira/browse/FLINK-4966 On Fri, Nov 4, 2016 at 5:52 AM, otherwise777 <[hidden email]> wrote: Cool, thnx for that, |
Hi all, @Wouter: I'm not sure I completely understand what you want to do, but would broadcast variables [1] help? @all: All-pairs-shortest-paths and betweenness centrality are very challenging algorithms to implement efficiently in a distributed way. APSP requires each vertex to store distances (or paths) for every other vertex in the graph. AFAIK there's no scalable distributed algorithm to compute these metrics. The solutions I'm aware of are (1) approximations and sketches (e.g. spanners), (2) replicating the whole graph to several nodes and compute paths in parallel, and (3) shared-memory implementations that exploit multi-core parallelism. How are you planning to implement these in Gelly? Cheers, -Vasia. On 4 November 2016 at 12:57, Greg Hogan <[hidden email]> wrote:
|
Hey there,
I don't really understand what Broadcast does, does it in a way export the elements from a DataSet in a Collection? Because then it might be what i'm looking for. when implementing algorithms in Flink Gelly i keep getting stuck on what i cannot do with DataSets. For example, if i want to determine the ShortestPath i need the SourceVertex, this can be inserted manually but this becomes a problem when you want to determine it for all the Vertices in a graph, which is what you need when determining certain metrics. Another example is looping over a DataSet, i found a topic which suggests using a mapping to loop/iterate over a DataSet, but since it needs to be serializable it cannot add elements to another Data Structure inside the mapping which makes it useless if you want to for example export some elements to another Data Structure. for the first example i could use collect(), but afaik it's more of a workaround in Flink to use it. About the APSP, this could become a problem for me since those are some of the things i originally wanted to do in my project. I didn't think about it yet how i'm going to implement it |
Free forum by Nabble | Edit this page |