(DEPRECATED) Apache Flink User Mailing List archive.

Spargel pagerank with sinks

Classic

List

Threaded

4 messages Options

Attila Bernáth

Spargel pagerank with sinks

Dear All,

I wonder how to write the pagerank program in the spargel API if there
might be sinks in the graph.

What is the nicest way to solve this?

Thank you for your answer.

Attila

Stephan Ewen

Re: Spargel pagerank with sinks

Hi!

With "sinks" in the graph, you mean vertices with no out-links?

There might be a simple trick, by adding to each vertex an edge-to-self (put an entry in the diagonal of the adjacency matrix).

I have not thought through the implications 100%.

@ssc Can you elaborate on this?

What would always work is that you gather statistics about how much probability is accumulated in the sinks and redistribute it across the other nodes.

The iteration aggregators allow you to do this. They can sum up the probability in the message sender function (when there is no outgoing edge),

and re-add it to the non-sink nodes (by accessing the aggregate from the previous iteration).

Have a look at the function "registerAggregator()" on the "VertexCentricIteration", and the Functions "getIterationAggregator()" and "getPreviousIterationAggregate()" on the VertexUpdateFunction and the MessagingFunction.

Stephan

On Thu, Sep 18, 2014 at 5:01 PM, Attila Bernáth <[hidden email]> wrote:

Dear All,

I wonder how to write the pagerank program in the spargel API if there
might be sinks in the graph.

What is the nicest way to solve this?

Thank you for your answer.

Attila

Attila Bernáth

Re: Spargel pagerank with sinks

Dear Stephan,

Thank you for your answer.

> With "sinks" in the graph, you mean vertices with no out-links?
Yes, I do.

> There might be a simple trick, by adding to each vertex an edge-to-self (put
> an entry in the diagonal of the adjacency matrix).
>
> I have not thought through the implications 100%.
> @ssc Can you elaborate on this?
I don't think that this works.

>
>
> What would always work is that you gather statistics about how much
> probability is accumulated in the sinks and redistribute it across the other
> nodes.
>
> The iteration aggregators allow you to do this. They can sum up the
> probability in the message sender function (when there is no outgoing edge),
> and re-add it to the non-sink nodes (by accessing the aggregate from the
> previous iteration).

Thank you for the idea. I was trying to use an aggregator, but I
thought that the aggregate from the previous iteration is of no use.

I will try this.

Thanks.

Attila

>
> Have a look at the function "registerAggregator()" on the
> "VertexCentricIteration", and the Functions "getIterationAggregator()" and
> "getPreviousIterationAggregate()" on the VertexUpdateFunction and the
> MessagingFunction.
>
>
> Stephan
>
> On Thu, Sep 18, 2014 at 5:01 PM, Attila Bernáth <[hidden email]>
> wrote:
>>
>> Dear All,
>>
>> I wonder how to write the pagerank program in the spargel API if there
>> might be sinks in the graph.
>>
>> What is the nicest way to solve this?
>>
>> Thank you for your answer.
>>
>> Attila
>
>

Sebastian Schelter-2

Re: Spargel pagerank with sinks

Hi Attila,

The common solution for handling vertices with out outlinks is to
aggregate the probability mass that is assigned to them and to uniformly
redistribute this among all vertices in every iteration.

You can have a look at Giraph's RandomWalkVertex which uses this
approach:
https://github.com/apache/giraph/blob/release-1.0/giraph-examples/src/main/java/org/apache/giraph/examples/RandomWalkVertex.java

Best,
Sebastian

On 09/19/2014 10:36 AM, Attila Bernáth wrote:

> Dear Stephan,
>
> Thank you for your answer.
>
>> With "sinks" in the graph, you mean vertices with no out-links?
> Yes, I do.
>
>> There might be a simple trick, by adding to each vertex an edge-to-self (put
>> an entry in the diagonal of the adjacency matrix).
>>
>> I have not thought through the implications 100%.
>> @ssc Can you elaborate on this?
> I don't think that this works.
>
>
>>
>>
>> What would always work is that you gather statistics about how much
>> probability is accumulated in the sinks and redistribute it across the other
>> nodes.
>>
>> The iteration aggregators allow you to do this. They can sum up the
>> probability in the message sender function (when there is no outgoing edge),
>> and re-add it to the non-sink nodes (by accessing the aggregate from the
>> previous iteration).
> Thank you for the idea. I was trying to use an aggregator, but I
> thought that the aggregate from the previous iteration is of no use.
>
> I will try this.
>
> Thanks.
>
> Attila
>
>
>>
>> Have a look at the function "registerAggregator()" on the
>> "VertexCentricIteration", and the Functions "getIterationAggregator()" and
>> "getPreviousIterationAggregate()" on the VertexUpdateFunction and the
>> MessagingFunction.
>>
>>
>> Stephan
>>
>> On Thu, Sep 18, 2014 at 5:01 PM, Attila Bernáth <[hidden email]>
>> wrote:
>>>
>>> Dear All,
>>>
>>> I wonder how to write the pagerank program in the spargel API if there
>>> might be sinks in the graph.
>>>
>>> What is the nicest way to solve this?
>>>
>>> Thank you for your answer.
>>>
>>> Attila
>>
>>