Hi Mailing list,
using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with int m = getRuntimeContext().getNumberOfParallelSubtasks(); I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster. Is there a way to access the number of the following reducer? In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m. My Job: DataSet<?> output = input .mapPartition(new MR_GPMRS_Mapper()) .groupBy(0) .reduceGroup(new MR_GPMRS_Reducer()); Thank you in advance Robert |
Within the mapper you cannot access the parallelism of the following nor
preceding operation. On 20.06.2016 15:56, Paschek, Robert wrote: > Hi Mailing list, > > using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with > int m = getRuntimeContext().getNumberOfParallelSubtasks(); > > I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster. > > Is there a way to access the number of the following reducer? > > In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m. > > My Job: > DataSet<?> output = input > .mapPartition(new MR_GPMRS_Mapper()) > .groupBy(0) > .reduceGroup(new MR_GPMRS_Reducer()); > > Thank you in advance > Robert |
Hi Robert, the number of parallel subtasks is the parallelism of the job or the individual operator. Only when executing Flink locally, the parallelism is set to the CPU cores. The number of groups generated by the groupBy() transformation doesn't affect the parallelism. Very often the number of groups is much higher than the parallelism, in those cases, each parallel instance will process multiple groups. If you want to know the parallelism of your operators globally, you'll need to set it manually (say all operators to a parallelism of 8). Regards, Robert On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler <[hidden email]> wrote: Within the mapper you cannot access the parallelism of the following nor preceding operation. |
Hi Chesnay, hi Robert Thank you for your explanations : - ) (And sorry for the late reply). Regards, Robert Von: Robert Metzger [mailto:[hidden email]]
Hi Robert, the number of parallel subtasks is the parallelism of the job or the individual operator. Only when executing Flink locally, the parallelism is set to the CPU cores. The number of groups generated by the groupBy() transformation doesn't affect the parallelism. Very often the number of groups is much higher than the parallelism, in those cases, each parallel instance will process multiple groups. If you want to know the parallelism of your operators globally, you'll need to set it manually (say all operators to a parallelism of 8). Regards, Robert On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler <[hidden email]> wrote: Within the mapper you cannot access the parallelism of the following nor preceding operation.
Hi Mailing list, |
Free forum by Nabble | Edit this page |