Task Slots and Heterogeneous Tasks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Task Slots and Heterogeneous Tasks

Maxim
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the latter needs no memory but a high degree of parallelism. 

Is any way to have different slot types and control allocation of operations to them? May be is there another way to ensure good hardware utilization?

Also from the documentation it is not clear if memory of a TaskManager is shared across all tasks running on it or each task gets its quota. Could you clarify it?

Thanks,

Maxim.


Reply | Threaded
Open this post in threaded view
|

Re: Task Slots and Heterogeneous Tasks

Stephan Ewen
Hi!

Slots are usually shared between the heavy and non heavy tasks, for that reason.

Let us know if you have more questions!

Greetings,
Stephan


On Fri, Apr 15, 2016 at 1:20 AM, Maxim <[hidden email]> wrote:
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the latter needs no memory but a high degree of parallelism. 

Is any way to have different slot types and control allocation of operations to them? May be is there another way to ensure good hardware utilization?

Also from the documentation it is not clear if memory of a TaskManager is shared across all tasks running on it or each task gets its quota. Could you clarify it?

Thanks,

Maxim.



Reply | Threaded
Open this post in threaded view
|

Re: Task Slots and Heterogeneous Tasks

Till Rohrmann
Hi Maxim,

concerning your second part of the question: The managed memory of a TaskManager is first split among the available slots. Each slot portion of the managed memory is again split among all operators which require managed memory when a pipeline is executed. In contrast to that, the heap memory is shared by all concurrently running tasks.

Cheers,
Till

On Fri, Apr 15, 2016 at 1:58 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

Slots are usually shared between the heavy and non heavy tasks, for that reason.

Let us know if you have more questions!

Greetings,
Stephan


On Fri, Apr 15, 2016 at 1:20 AM, Maxim <[hidden email]> wrote:
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the latter needs no memory but a high degree of parallelism. 

Is any way to have different slot types and control allocation of operations to them? May be is there another way to ensure good hardware utilization?

Also from the documentation it is not clear if memory of a TaskManager is shared across all tasks running on it or each task gets its quota. Could you clarify it?

Thanks,

Maxim.




Reply | Threaded
Open this post in threaded view
|

Re: Task Slots and Heterogeneous Tasks

Maxim
I see. Sharing slots among subtasks makes sense. 
So by default a subtask that executes a map function that calls a  high latency external service is going to be put in a separate slot. Is it possible to indicate to the Flink that subtasks of a particular operation can be collocated in a slot, as such subtasks are IO bound and require no shared memory?

On Fri, Apr 15, 2016 at 5:31 AM, Till Rohrmann <[hidden email]> wrote:
Hi Maxim,

concerning your second part of the question: The managed memory of a TaskManager is first split among the available slots. Each slot portion of the managed memory is again split among all operators which require managed memory when a pipeline is executed. In contrast to that, the heap memory is shared by all concurrently running tasks.

Cheers,
Till

On Fri, Apr 15, 2016 at 1:58 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

Slots are usually shared between the heavy and non heavy tasks, for that reason.

Let us know if you have more questions!

Greetings,
Stephan


On Fri, Apr 15, 2016 at 1:20 AM, Maxim <[hidden email]> wrote:
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the latter needs no memory but a high degree of parallelism. 

Is any way to have different slot types and control allocation of operations to them? May be is there another way to ensure good hardware utilization?

Also from the documentation it is not clear if memory of a TaskManager is shared across all tasks running on it or each task gets its quota. Could you clarify it?

Thanks,

Maxim.





Reply | Threaded
Open this post in threaded view
|

Re: Task Slots and Heterogeneous Tasks

Till Rohrmann
No, it's not possible at the moment to deploy more than one task of the same kind to a single slot.

On Fri, Apr 15, 2016 at 8:08 PM, Maxim <[hidden email]> wrote:
I see. Sharing slots among subtasks makes sense. 
So by default a subtask that executes a map function that calls a  high latency external service is going to be put in a separate slot. Is it possible to indicate to the Flink that subtasks of a particular operation can be collocated in a slot, as such subtasks are IO bound and require no shared memory?

On Fri, Apr 15, 2016 at 5:31 AM, Till Rohrmann <[hidden email]> wrote:
Hi Maxim,

concerning your second part of the question: The managed memory of a TaskManager is first split among the available slots. Each slot portion of the managed memory is again split among all operators which require managed memory when a pipeline is executed. In contrast to that, the heap memory is shared by all concurrently running tasks.

Cheers,
Till

On Fri, Apr 15, 2016 at 1:58 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

Slots are usually shared between the heavy and non heavy tasks, for that reason.

Let us know if you have more questions!

Greetings,
Stephan


On Fri, Apr 15, 2016 at 1:20 AM, Maxim <[hidden email]> wrote:
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the latter needs no memory but a high degree of parallelism. 

Is any way to have different slot types and control allocation of operations to them? May be is there another way to ensure good hardware utilization?

Also from the documentation it is not clear if memory of a TaskManager is shared across all tasks running on it or each task gets its quota. Could you clarify it?

Thanks,

Maxim.