http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Apache-Flink-1-1-4-Gelly-LocalClusteringCoefficient-Returning-values-above-1-tp11187p11195.html
Hello Vasia and Greg,
Thank you for the feedback!
I am probably misusing the Gelly API in some way, but I thought I could run the undirected version after calling getUndirected()?
While not going into the concept of local clustering coefficients, I thought that from a Gelly API point-of-view, both my code and data set were properly established.
However:
- I believe that the graph was already undirected;
- I am getting NaN results after executing the algorithm.
This is the code I am using to obtain an (undirected) graph instance upon which I call LocalClusteringCoefficient:
import org.apache.flink.graph.library.clustering.undirected.LocalClusteringCoefficient.Result;
import org.apache.flink.graph.library.clustering.undirected.LocalClusteringCoefficient;
/** other imports and method definitions **/
// Generate edge tuples from the input file.
final DataSet<Tuple2<LongValue, LongValue>> edgeTuples = env.readCsvFile(inputPath)
.fieldDelimiter("\t") // node IDs are separated by spaces
.ignoreComments("#") // comments start with "%"
.types(LongValue.class, LongValue.class);
// Generate actual Edge<Long, Double> instances.
@SuppressWarnings("serial")
final DataSet<Edge<LongValue, Double>> edges = edgeTuples.map(
new MapFunction<Tuple2<LongValue, LongValue>, Edge<LongValue, Double>>() {
@Override
public Edge<LongValue, Double> map(Tuple2<LongValue, LongValue> arg0) throws Exception {
return new Edge<LongValue, Double>(arg0.f0, arg0.f1, 1.0d);
}
});
// Generate the basic graph.
@SuppressWarnings("serial")
final Graph<LongValue, Double, Double> graph = Graph.fromDataSet(
edges,
new MapFunction<LongValue, Double>() {
@Override
public Double map(LongValue arg0) throws Exception {
// For testing purposes, just setting each vertex value to 1.0.
return 1.0;
}
},
env).getUndirected();
// Execute the LocalClusteringCoefficient algorithm.
final DataSet<Result<LongValue>> localClusteringCoefficients = graph.run(new LocalClusteringCoefficient<LongValue, Double, Double>());
// Get the values as per Vasia's help:
@SuppressWarnings("serial")
DataSet<Double> CLUSTERING_COEFFICIENTS = localClusteringCoefficients.map(new MapFunction<Result<LongValue>, Double>() {
@Override
public Double map(Result<LongValue> arg0) throws Exception {
return arg0.getLocalClusteringCoefficientScore();
}
});
I believe this is the correct way to get a DataSet<Double> of coefficients from a DataSet<Result<LongValue>> ?
Among the coefficients are a lot of NaN values:
CLUSTERING_COEFFICIENTS.print();
NaN
0.0
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
Apologies for the verbosity in advance, but just to provide detail, printing the graph edges yields this (notice that for each pair or vertices there are two links, which are original and the reverse version derived from getUndirected()).
Greg: I therefore believe the graph is undirected:
graph.getEdgesAsTuple3().print();
(5113,6008,1.0)
(6008,5113,1.0)
(5113,6774,1.0)
(6774,5113,1.0)
(5113,32938,1.0)
(32938,5113,1.0)
(5113,6545,1.0)
(6545,5113,1.0)
(5113,7088,1.0)
(7088,5113,1.0)
(5113,37929,1.0)
(37929,5113,1.0)
(5113,26562,1.0)
(26562,5113,1.0)
(5113,6107,1.0)
(6107,5113,1.0)
(5113,7171,1.0)
(7171,5113,1.0)
(5113,6192,1.0)
(6192,5113,1.0)
(5113,7763,1.0)
(7763,5113,1.0)
(9748,5113,1.0)
(5113,9748,1.0)
(10191,5113,1.0)
(5113,10191,1.0)
(6064,5113,1.0)
(5113,6064,1.0)
(6065,5113,1.0)
(5113,6065,1.0)
(6279,5113,1.0)
(5113,6279,1.0)
(4907,5113,1.0)
(5113,4907,1.0)
(6465,5113,1.0)
(5113,6465,1.0)
(6707,5113,1.0)
(5113,6707,1.0)
(7089,5113,1.0)
(5113,7089,1.0)
(7172,5113,1.0)
(5113,7172,1.0)
(14310,5113,1.0)
(5113,14310,1.0)
(6252,5113,1.0)
(5113,6252,1.0)
(33855,5113,1.0)
(5113,33855,1.0)
(7976,5113,1.0)
(5113,7976,1.0)
(26284,5113,1.0)
(5113,26284,1.0)
(8056,5113,1.0)
(5113,8056,1.0)
(10371,5113,1.0)
(5113,10371,1.0)
(16785,5113,1.0)
(5113,16785,1.0)
(19801,5113,1.0)
(5113,19801,1.0)
(6715,5113,1.0)
(5113,6715,1.0)
(31724,5113,1.0)
(5113,31724,1.0)
(32443,5113,1.0)
(5113,32443,1.0)
(10370,5113,1.0)
(5113,10370,1.0)
Any insight into what I may be doing wrong would be greatly appreciated.
Thanks for your time,
Kind regards,