partition identifier

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

partition identifier

Attila Bernáth
Dear Developers,

Datasets are partitioned between machines. I wonder if there is a way
to get some identifier of a partition. I see that the class
HashPartition has a getPartitionNumber method, but I don't see how I
could use this.
(For example, I would like to see the partition identifier in a
MapFunction, or in a MapPartitionFunction).

Attila
Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Stephan Ewen
Hi!

You can always use the "rich" version of the function, for example the "RichMapFunction". Inside that function, you can call "getRuntimeContext()", which gives you access to many things, among them the partition number.

Stephan


On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> wrote:
Dear Developers,

Datasets are partitioned between machines. I wonder if there is a way
to get some identifier of a partition. I see that the class
HashPartition has a getPartitionNumber method, but I don't see how I
could use this.
(For example, I would like to see the partition identifier in a
MapFunction, or in a MapPartitionFunction).

Attila

Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Stephan Ewen
Hey!


Greetings,
Stephan


On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

You can always use the "rich" version of the function, for example the "RichMapFunction". Inside that function, you can call "getRuntimeContext()", which gives you access to many things, among them the partition number.

Stephan


On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> wrote:
Dear Developers,

Datasets are partitioned between machines. I wonder if there is a way
to get some identifier of a partition. I see that the class
HashPartition has a getPartitionNumber method, but I don't see how I
could use this.
(For example, I would like to see the partition identifier in a
MapFunction, or in a MapPartitionFunction).

Attila


Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Attila Bernáth
Thank you, Stephan.
How to access the partition number from the RuntimeContext?

Attila

2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>:

> Hey!
>
> Here is a brief description how to use rich functions:
> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>
> Greetings,
> Stephan
>
>
> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hi!
>>
>> You can always use the "rich" version of the function, for example the
>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>> which gives you access to many things, among them the partition number.
>>
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]>
>> wrote:
>>>
>>> Dear Developers,
>>>
>>> Datasets are partitioned between machines. I wonder if there is a way
>>> to get some identifier of a partition. I see that the class
>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>> could use this.
>>> (For example, I would like to see the partition identifier in a
>>> MapFunction, or in a MapPartitionFunction).
>>>
>>> Attila
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Attila Bernáth
I think I have found it: it must be
getRuntimeContext().getIndexOfThisSubtask();
Attila

2014-12-03 16:12 GMT+01:00 Attila Bernáth <[hidden email]>:

> Thank you, Stephan.
> How to access the partition number from the RuntimeContext?
>
> Attila
>
> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>:
>> Hey!
>>
>> Here is a brief description how to use rich functions:
>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> You can always use the "rich" version of the function, for example the
>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>> which gives you access to many things, among them the partition number.
>>>
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]>
>>> wrote:
>>>>
>>>> Dear Developers,
>>>>
>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>> to get some identifier of a partition. I see that the class
>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>> could use this.
>>>> (For example, I would like to see the partition identifier in a
>>>> MapFunction, or in a MapPartitionFunction).
>>>>
>>>> Attila
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Aljoscha Krettek
In reply to this post by Attila Bernáth
RuntimeContext.getIndexOfThisSubtask()

What do you want to use this partition number for? If I may ask.

Cheers,
Aljoscha

On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <[hidden email]> wrote:

> Thank you, Stephan.
> How to access the partition number from the RuntimeContext?
>
> Attila
>
> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>:
>> Hey!
>>
>> Here is a brief description how to use rich functions:
>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> You can always use the "rich" version of the function, for example the
>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>> which gives you access to many things, among them the partition number.
>>>
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]>
>>> wrote:
>>>>
>>>> Dear Developers,
>>>>
>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>> to get some identifier of a partition. I see that the class
>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>> could use this.
>>>> (For example, I would like to see the partition identifier in a
>>>> MapFunction, or in a MapPartitionFunction).
>>>>
>>>> Attila
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: partition identifier

Attila Bernáth
I am trying to write some code that is cleverer than the optimizer.
The idea is that in spargel you often want to send the same message to
many other graph nodes. These target nodes are partitioned between the
machines of your cluster, and it would make sense to send the message
to a target machine only once, and then it would distribute it to the
nodes it is holding.

Attila

2014-12-03 16:21 GMT+01:00 Aljoscha Krettek <[hidden email]>:

> RuntimeContext.getIndexOfThisSubtask()
>
> What do you want to use this partition number for? If I may ask.
>
> Cheers,
> Aljoscha
>
> On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <[hidden email]> wrote:
>> Thank you, Stephan.
>> How to access the partition number from the RuntimeContext?
>>
>> Attila
>>
>> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>:
>>> Hey!
>>>
>>> Here is a brief description how to use rich functions:
>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>> Hi!
>>>>
>>>> You can always use the "rich" version of the function, for example the
>>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>>> which gives you access to many things, among them the partition number.
>>>>
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Dear Developers,
>>>>>
>>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>>> to get some identifier of a partition. I see that the class
>>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>>> could use this.
>>>>> (For example, I would like to see the partition identifier in a
>>>>> MapFunction, or in a MapPartitionFunction).
>>>>>
>>>>> Attila
>>>>
>>>>
>>>