Hello Vasia and Greg,Thank you for the feedback!I am probably misusing the Gelly API in some way, but I thought I could run the undirected version after calling getUndirected()?While not going into the concept of local clustering coefficients, I thought that from a Gelly API point-of-view, both my code and data set were properly established.However:- I believe that the graph was already undirected;- I am getting NaN results after executing the algorithm.This is the code I am using to obtain an (undirected) graph instance upon which I call LocalClusteringCoefficient:
import org.apache.flink.graph.library.clustering.undirected. LocalClusteringCoefficient. Result;
import org.apache.flink.graph.library.clustering.undirected. LocalClusteringCoefficient; /** other imports and method definitions **/// Generate edge tuples from the input file.
final DataSet<Tuple2<LongValue, LongValue>> edgeTuples = env.readCsvFile(inputPath)
.fieldDelimiter("\t") // node IDs are separated by spaces
.ignoreComments("#") // comments start with "%"
.types(LongValue.class, LongValue.class);
// Generate actual Edge<Long, Double> instances.
@SuppressWarnings("serial")
final DataSet<Edge<LongValue, Double>> edges = edgeTuples.map(
new MapFunction<Tuple2<LongValue, LongValue>, Edge<LongValue, Double>>() {
@Override
public Edge<LongValue, Double> map(Tuple2<LongValue, LongValue> arg0) throws Exception {
return new Edge<LongValue, Double>(arg0.f0, arg0.f1, 1.0d);
}
});
// Generate the basic graph.
@SuppressWarnings("serial")
final Graph<LongValue, Double, Double> graph = Graph.fromDataSet(
edges,
new MapFunction<LongValue, Double>() {
@Override
public Double map(LongValue arg0) throws Exception {// For testing purposes, just setting each vertex value to 1.0.return 1.0;
}
},
env).getUndirected();// Execute the LocalClusteringCoefficient algorithm.final DataSet<Result<LongValue>> localClusteringCoefficients = graph.run(new LocalClusteringCoefficient<LongValue, Double, Double>()); // Get the values as per Vasia's help:
@SuppressWarnings("serial")
DataSet<Double> CLUSTERING_COEFFICIENTS = localClusteringCoefficients.map(new MapFunction<Result<LongValue>, Double>() {
@Override
public Double map(Result<LongValue> arg0) throws Exception {
return arg0.getLocalClusteringCoefficientS core();
}
});I believe this is the correct way to get a DataSet<Double> of coefficients from a DataSet<Result<LongValue>> ?Among the coefficients are a lot of NaN values:
CLUSTERING_COEFFICIENTS.print();
NaN
0.0
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaNApologies for the verbosity in advance, but just to provide detail, printing the graph edges yields this (notice that for each pair or vertices there are two links, which are original and the reverse version derived from getUndirected()).
Greg: I therefore believe the graph is undirected:
graph.getEdgesAsTuple3().print();
(5113,6008,1.0)
(6008,5113,1.0)
(5113,6774,1.0)
(6774,5113,1.0)
(5113,32938,1.0)
(32938,5113,1.0)
(5113,6545,1.0)
(6545,5113,1.0)
(5113,7088,1.0)
(7088,5113,1.0)
(5113,37929,1.0)
(37929,5113,1.0)
(5113,26562,1.0)
(26562,5113,1.0)
(5113,6107,1.0)
(6107,5113,1.0)
(5113,7171,1.0)
(7171,5113,1.0)
(5113,6192,1.0)
(6192,5113,1.0)
(5113,7763,1.0)
(7763,5113,1.0)
(9748,5113,1.0)
(5113,9748,1.0)
(10191,5113,1.0)
(5113,10191,1.0)
(6064,5113,1.0)
(5113,6064,1.0)
(6065,5113,1.0)
(5113,6065,1.0)
(6279,5113,1.0)
(5113,6279,1.0)
(4907,5113,1.0)
(5113,4907,1.0)
(6465,5113,1.0)
(5113,6465,1.0)
(6707,5113,1.0)
(5113,6707,1.0)
(7089,5113,1.0)
(5113,7089,1.0)
(7172,5113,1.0)
(5113,7172,1.0)
(14310,5113,1.0)
(5113,14310,1.0)
(6252,5113,1.0)
(5113,6252,1.0)
(33855,5113,1.0)
(5113,33855,1.0)
(7976,5113,1.0)
(5113,7976,1.0)
(26284,5113,1.0)
(5113,26284,1.0)
(8056,5113,1.0)
(5113,8056,1.0)
(10371,5113,1.0)
(5113,10371,1.0)
(16785,5113,1.0)
(5113,16785,1.0)
(19801,5113,1.0)
(5113,19801,1.0)
(6715,5113,1.0)
(5113,6715,1.0)
(31724,5113,1.0)
(5113,31724,1.0)
(32443,5113,1.0)
(5113,32443,1.0)
(10370,5113,1.0)
(5113,10370,1.0)Any insight into what I may be doing wrong would be greatly appreciated.Thanks for your time,Kind regards,Miguel E. Coimbra
Email: [hidden email]
Skype: miguel.e.coimbraOn 20 January 2017 at 19:31, Greg Hogan <[hidden email]> wrote:Hi Miguel,
The '--output print' option describes the values and also displays the local clustering coefficient value.You're running the undirected algorithm on a directed graph. In 1.2 there is an option '--simplify true' that will add reverse edges and remove duplicate edges and self-loops. Alternatively, it looks like you could simply add reverse edges to your input file (with an optional ' | sort | uniq' following):
$ cat edges.txt | awk ' { print $1, $2; print $2, $1 } 'The drivers are being reworked for 1.3 to better reuse code and options which will better support additional drivers and algorithms and make documentation simpler.GregOn Fri, Jan 20, 2017 at 2:06 PM, Vasiliki Kalavri <[hidden email]> wrote:Hi Miguel,the LocalClusteringCoefficient algorithm returns a DataSet of type Result, which basically wraps a vertex id, its degree, and the number of triangles containing this vertex. The number 11 you see is indeed the degree of vertex 5113. The Result type contains the method getLocalClusteringCoefficientScore() which allows you to retrieve the clustering coefficient score for a vertex. The method simply divides the numbers of triangles by the number of potential edges between neighbors.I'm sorry that you this is not clear in the docs. We should definitely improve them to explain what is the output and how to retrieve the actual clustering coefficient values. I have opened a JIRA for this [1].Cheers,-Vasia.On 20 January 2017 at 19:31, Miguel Coimbra <[hidden email]> wrote:Hello,
In the documentation of the LocalClusteringCoefficient algorithm, it is said:
The local clustering coefficient measures the connectedness of each vertex’s neighborhood.
Scores range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique).
https://ci.apache.org/projects/flink/flink-docs-release-1.1/ apis/batch/libs/gelly.html#loc al-clustering-coefficient However, upon running the algorithm (undirected version), I obtained values above 1.The result I got was this. As you can see, vertex 5113 has a score of 11:
(the input edges for the graph are shown further below - around 35 edges):
(4907,(1,0))
(5113,(11,0))
(6008,(0,0))
(6064,(1,0))
(6065,(1,0))
(6107,(0,0))
(6192,(0,0))
(6252,(1,0))
(6279,(1,0))
(6465,(1,0))
(6545,(0,0))
(6707,(1,0))
(6715,(1,0))
(6774,(0,0))
(7088,(0,0))
(7089,(1,0))
(7171,(0,0))
(7172,(1,0))
(7763,(0,0))
(7976,(1,0))
(8056,(1,0))
(9748,(1,0))
(10191,(1,0))
(10370,(1,0))
(10371,(1,0))
(14310,(1,0))
(16785,(1,0))
(19801,(1,0))
(26284,(1,0))
(26562,(0,0))
(31724,(1,0))
(32443,(1,0))
(32938,(0,0))
(33855,(1,0))
(37929,(0,0))This was from a small isolated test with these edges:
5113 6008
5113 6774
5113 32938
5113 6545
5113 7088
5113 37929
5113 26562
5113 6107
5113 7171
5113 6192
5113 7763
9748 5113
10191 5113
6064 5113
6065 5113
6279 5113
4907 5113
6465 5113
6707 5113
7089 5113
7172 5113
14310 5113
6252 5113
33855 5113
7976 5113
<a href="tel:262%20845%20113" value="+351262845113" target="_blank">26284 5113
8056 5113
10371 5113
16785 5113
19801 5113
6715 5113
31724 5113
32443 5113
10370 5113I am not sure what I may be doing wrong, but is there perhaps some form of normalization lacking in my execution of:
org.apache.flink.graph.library.clustering.undirected.LocalCl usteringCoefficient.Result;
org.apache.flink.graph.library.clustering.undirected.LocalCl usteringCoefficient; Am I supposed to divide all scores by the greatest score obtained by the algorithm?Thank you very much!Miguel E. Coimbra
Email: [hidden email]
Skype: miguel.e.coimbra
Free forum by Nabble | Edit this page |