Re: Spargel pagerank with sinks

Posted by Sebastian Schelter-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Spargel-pagerank-with-sinks-tp92p95.html

Hi Attila,


The common solution for handling vertices with out outlinks is to
aggregate the probability mass that is assigned to them and to uniformly
redistribute this among all vertices in every iteration.

You can have a look at Giraph's RandomWalkVertex which uses this
approach:
https://github.com/apache/giraph/blob/release-1.0/giraph-examples/src/main/java/org/apache/giraph/examples/RandomWalkVertex.java


Best,
Sebastian

On 09/19/2014 10:36 AM, Attila BernĂ¡th wrote:

> Dear Stephan,
>
> Thank you for your answer.
>
>> With "sinks" in the graph, you mean vertices with no out-links?
> Yes, I do.
>
>> There might be a simple trick, by adding to each vertex an edge-to-self (put
>> an entry in the diagonal of the adjacency matrix).
>>
>> I have not thought through the implications 100%.
>> @ssc Can you elaborate on this?
> I don't think that this works.
>
>
>>
>>
>> What would always work is that you gather statistics about how much
>> probability is accumulated in the sinks and redistribute it across the other
>> nodes.
>>
>> The iteration aggregators allow you to do this. They can sum up the
>> probability in the message sender function (when there is no outgoing edge),
>> and re-add it to the non-sink nodes (by accessing the aggregate from the
>> previous iteration).
> Thank you for the idea. I was trying to use an aggregator, but I
> thought that the aggregate from the previous iteration is of no use.
>
> I will try this.
>
> Thanks.
>
> Attila
>
>
>>
>> Have a look at the function "registerAggregator()" on the
>> "VertexCentricIteration", and the Functions "getIterationAggregator()" and
>> "getPreviousIterationAggregate()" on the VertexUpdateFunction and the
>> MessagingFunction.
>>
>>
>> Stephan
>>
>> On Thu, Sep 18, 2014 at 5:01 PM, Attila BernĂ¡th <[hidden email]>
>> wrote:
>>>
>>> Dear All,
>>>
>>> I wonder how to write the pagerank program in the spargel API if there
>>> might be sinks in the graph.
>>>
>>> What is the nicest way to solve this?
>>>
>>> Thank you for your answer.
>>>
>>> Attila
>>
>>