Dear Developers,
Datasets are partitioned between machines. I wonder if there is a way to get some identifier of a partition. I see that the class HashPartition has a getPartitionNumber method, but I don't see how I could use this. (For example, I would like to see the partition identifier in a MapFunction, or in a MapPartitionFunction). Attila |
Hi! You can always use the "rich" version of the function, for example the "RichMapFunction". Inside that function, you can call "getRuntimeContext()", which gives you access to many things, among them the partition number. Stephan On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> wrote: Dear Developers, |
Hey! Here is a brief description how to use rich functions: http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink Greetings, Stephan On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote:
|
Thank you, Stephan.
How to access the partition number from the RuntimeContext? Attila 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>: > Hey! > > Here is a brief description how to use rich functions: > http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink > > Greetings, > Stephan > > > On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote: >> >> Hi! >> >> You can always use the "rich" version of the function, for example the >> "RichMapFunction". Inside that function, you can call "getRuntimeContext()", >> which gives you access to many things, among them the partition number. >> >> Stephan >> >> >> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> >> wrote: >>> >>> Dear Developers, >>> >>> Datasets are partitioned between machines. I wonder if there is a way >>> to get some identifier of a partition. I see that the class >>> HashPartition has a getPartitionNumber method, but I don't see how I >>> could use this. >>> (For example, I would like to see the partition identifier in a >>> MapFunction, or in a MapPartitionFunction). >>> >>> Attila >> >> > |
I think I have found it: it must be
getRuntimeContext().getIndexOfThisSubtask(); Attila 2014-12-03 16:12 GMT+01:00 Attila Bernáth <[hidden email]>: > Thank you, Stephan. > How to access the partition number from the RuntimeContext? > > Attila > > 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>: >> Hey! >> >> Here is a brief description how to use rich functions: >> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink >> >> Greetings, >> Stephan >> >> >> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> Hi! >>> >>> You can always use the "rich" version of the function, for example the >>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()", >>> which gives you access to many things, among them the partition number. >>> >>> Stephan >>> >>> >>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> >>> wrote: >>>> >>>> Dear Developers, >>>> >>>> Datasets are partitioned between machines. I wonder if there is a way >>>> to get some identifier of a partition. I see that the class >>>> HashPartition has a getPartitionNumber method, but I don't see how I >>>> could use this. >>>> (For example, I would like to see the partition identifier in a >>>> MapFunction, or in a MapPartitionFunction). >>>> >>>> Attila >>> >>> >> |
In reply to this post by Attila Bernáth
RuntimeContext.getIndexOfThisSubtask()
What do you want to use this partition number for? If I may ask. Cheers, Aljoscha On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <[hidden email]> wrote: > Thank you, Stephan. > How to access the partition number from the RuntimeContext? > > Attila > > 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>: >> Hey! >> >> Here is a brief description how to use rich functions: >> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink >> >> Greetings, >> Stephan >> >> >> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> Hi! >>> >>> You can always use the "rich" version of the function, for example the >>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()", >>> which gives you access to many things, among them the partition number. >>> >>> Stephan >>> >>> >>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> >>> wrote: >>>> >>>> Dear Developers, >>>> >>>> Datasets are partitioned between machines. I wonder if there is a way >>>> to get some identifier of a partition. I see that the class >>>> HashPartition has a getPartitionNumber method, but I don't see how I >>>> could use this. >>>> (For example, I would like to see the partition identifier in a >>>> MapFunction, or in a MapPartitionFunction). >>>> >>>> Attila >>> >>> >> |
I am trying to write some code that is cleverer than the optimizer.
The idea is that in spargel you often want to send the same message to many other graph nodes. These target nodes are partitioned between the machines of your cluster, and it would make sense to send the message to a target machine only once, and then it would distribute it to the nodes it is holding. Attila 2014-12-03 16:21 GMT+01:00 Aljoscha Krettek <[hidden email]>: > RuntimeContext.getIndexOfThisSubtask() > > What do you want to use this partition number for? If I may ask. > > Cheers, > Aljoscha > > On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <[hidden email]> wrote: >> Thank you, Stephan. >> How to access the partition number from the RuntimeContext? >> >> Attila >> >> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[hidden email]>: >>> Hey! >>> >>> Here is a brief description how to use rich functions: >>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink >>> >>> Greetings, >>> Stephan >>> >>> >>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[hidden email]> wrote: >>>> >>>> Hi! >>>> >>>> You can always use the "rich" version of the function, for example the >>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()", >>>> which gives you access to many things, among them the partition number. >>>> >>>> Stephan >>>> >>>> >>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[hidden email]> >>>> wrote: >>>>> >>>>> Dear Developers, >>>>> >>>>> Datasets are partitioned between machines. I wonder if there is a way >>>>> to get some identifier of a partition. I see that the class >>>>> HashPartition has a getPartitionNumber method, but I don't see how I >>>>> could use this. >>>>> (For example, I would like to see the partition identifier in a >>>>> MapFunction, or in a MapPartitionFunction). >>>>> >>>>> Attila >>>> >>>> >>> |
Free forum by Nabble | Edit this page |