Getting the NumberOfParallelSubtask

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting the NumberOfParallelSubtask

Paschek, Robert
Hi Mailing list,

using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with
int m = getRuntimeContext().getNumberOfParallelSubtasks();

I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster.

Is there a way to access the number of the following reducer?

In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m.

My Job:
DataSet<?> output = input
                                .mapPartition(new MR_GPMRS_Mapper())
                                .groupBy(0)
                                .reduceGroup(new MR_GPMRS_Reducer());

Thank you in advance
Robert
Reply | Threaded
Open this post in threaded view
|

Re: Getting the NumberOfParallelSubtask

Chesnay Schepler
Within the mapper you cannot access the parallelism of the following nor
preceding operation.

On 20.06.2016 15:56, Paschek, Robert wrote:

> Hi Mailing list,
>
> using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with
> int m = getRuntimeContext().getNumberOfParallelSubtasks();
>
> I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster.
>
> Is there a way to access the number of the following reducer?
>
> In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m.
>
> My Job:
> DataSet<?> output = input
> .mapPartition(new MR_GPMRS_Mapper())
> .groupBy(0)
> .reduceGroup(new MR_GPMRS_Reducer());
>
> Thank you in advance
> Robert

Reply | Threaded
Open this post in threaded view
|

Re: Getting the NumberOfParallelSubtask

rmetzger0
Hi Robert,

the number of parallel subtasks is the parallelism of the job or the individual operator. Only when executing Flink locally, the parallelism is set to the CPU cores.
The number of groups generated by the groupBy() transformation doesn't affect the parallelism. Very often the number of groups is much higher than the parallelism, in those cases, each parallel instance will process multiple groups.

If you want to know the parallelism of your operators globally, you'll need to set it manually (say all operators to a parallelism of 8).

Regards,
Robert


On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler <[hidden email]> wrote:
Within the mapper you cannot access the parallelism of the following nor preceding operation.


On 20.06.2016 15:56, Paschek, Robert wrote:
Hi Mailing list,

using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with
int m = getRuntimeContext().getNumberOfParallelSubtasks();

I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster.

Is there a way to access the number of the following reducer?

In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m.

My Job:
DataSet<?> output = input
                                .mapPartition(new MR_GPMRS_Mapper())
                                .groupBy(0)
                                .reduceGroup(new MR_GPMRS_Reducer());

Thank you in advance
Robert


Reply | Threaded
Open this post in threaded view
|

AW: Getting the NumberOfParallelSubtask

Paschek, Robert

Hi Chesnay, hi Robert

 

Thank you for your explanations : - )

(And sorry for the late reply).

 

Regards,

Robert

 

Von: Robert Metzger [mailto:[hidden email]]
Gesendet: Dienstag, 21. Juni 2016 12:12
An: [hidden email]
Betreff: Re: Getting the NumberOfParallelSubtask

 

Hi Robert,

 

the number of parallel subtasks is the parallelism of the job or the individual operator. Only when executing Flink locally, the parallelism is set to the CPU cores.

The number of groups generated by the groupBy() transformation doesn't affect the parallelism. Very often the number of groups is much higher than the parallelism, in those cases, each parallel instance will process multiple groups.

 

If you want to know the parallelism of your operators globally, you'll need to set it manually (say all operators to a parallelism of 8).

 

Regards,

Robert

 

 

On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler <[hidden email]> wrote:

Within the mapper you cannot access the parallelism of the following nor preceding operation.



On 20.06.2016 15:56, Paschek, Robert wrote:

Hi Mailing list,

using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with
int m = getRuntimeContext().getNumberOfParallelSubtasks();

I think that would be - in general - the total number of CPU Cores used by Apache Flink among the cluster.

Is there a way to access the number of the following reducer?

In general i would assume that the number of the following reducers depends on the number of groups generated by the groupBy() transformation. So the number of the reducer r would be 1 <= r <= m.

My Job:
DataSet<?> output = input
                                .mapPartition(new MR_GPMRS_Mapper())
                                .groupBy(0)
                                .reduceGroup(new MR_GPMRS_Reducer());

Thank you in advance
Robert