Thanks that is working now!
I have one last question.
Goin one step further, I have changed vertex value type to be a POJO class. The structure is somewhat similar to this,
class LocalStorage {
Integer id;
Long degree;
Boolean active;
List<Long> labels;
Map<Long, Long> neighborDegree;
….
}
During the execution, I got an error saying that `org.apache.flink.api.common.InvalidProgramException: This type (GenericType<java.util.ArrayList>) cannot be used as key`.
After having some reading I have implemented Value and Comparable interfaces . But now, after scatter phase ends, the localstorage values seems to be not persistent. For, I set the the active flag true in scatter phase, but during gather phase, the `active` flag seems to be null. (Note that, I have already know the degree and id is accessible within vertex context, but I am just trying to see what can I do with the framework).
I am guessing this is a serialization/deserialization related issue. I had some online digging and github search but I couldn’t really find a fix it. I have tried some approaches suggested on SO but they didn’t work for me. Is this a problem related to my POJO class having list and map types? Is this supported?
It would be great if someone can point out a similar example or an easy fix.
Best
Kaan
Hi Kaan,
I think what you are proposing is something like this:
Graph<Long, Double, Double> graph = ... // get first batch
Graph<Long, Double, Double> graphAfterFirstSG = graph.runScatterGatherIteration();
Graph<Long, Double, Double> secondBatch = ... // get second batch
// Adjust the result of SG iteration with secondBatch
Graph<Long, Double, Double> updatedGraph = graphAfterFirstSG.union/difference(secondBatch));
updatedGraph.runScatterGatherIteration();
Then I believe this should work.
Cheers,
Till
Thanks for the useful information! It seems like a good and fun idea to experiment. I will definitely give it a try.
I have a very close upcoming deadline and I have already implemented the Scatter-Gather iteration algorithm.
I have another question on whether we can chain Scatter-Gather or Vertex-Centric iterations.
Let’s say that we have an initial batch/dataset, we run a Scatter-Gather and obtain graph.
Using another batch we added/deleted vertices to the graph we obtained.
Now we run another Scatter-Gather on the modified graph.
This is no streaming but a naive way to simulate batch updates that are happening concurrently.
Do you think it is a feasible way to do this way?
Best
Kaan
On Apr 13, 2020, at 11:16 PM, Tzu-Li (Gordon) Tai <
[hidden email]> wrote: