(DEPRECATED) Apache Flink User Mailing List archive.

threading and distribution

Classic

List

Threaded

3 messages Options

Marco Villalobos-2

threading and distribution

as data flows from a source through a pipeline of operators and finally sinks, is there a means to control how many threads are used within an operator, and how an operator is distributed across the network?

Where can I read up on these types of details specifically?

Marco Villalobos-2

Re: threading and distribution

Okay, I am following up to my question. I see information regarding the threading and distribution model on the documentation about the architecture.

https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html

Next, I want to read up on what I have control over.

On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos <[hidden email]> wrote:

as data flows from a source through a pipeline of operators and finally sinks, is there a means to control how many threads are used within an operator, and how an operator is distributed across the network?

Where can I read up on these types of details specifically?

Matthias

Re: threading and distribution

Hi Marco,

sorry for the late reply. The documentation you found [1] is already a good start. You can define how many subtasks of an operator run in parallel using the operator's parallelism configuration [2]. Each operator's subtask will run in a separate task slot. There's the concept of slot sharing as described in [3] which enables Flink to run subtasks of different operators of the same job in the same slot. This enables the TaskManager to run an entire pipeline in a single slot [3].

The maximum parallelism of your job is bound by the number of available task slots in the Flink cluster which can be defined through the number of slots per TaskManager [4][5] and the number of TaskManagers running in your Flink cluster (taskmanager.numberOfTaskSlots * #taskmanagers = maximum possible parallelism for an operator/pipeline).

I hope this was still helpful.

Best,
Matthias

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html

[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#:~:text=A%20Flink%20program%20consists%20of,task%20is%20called%20its%20parallelism.

[3] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/flink-architecture.html#task-slots-and-resources

[4] https://ci.apache.org/projects/flink/flink-docs-stable/deployment/config.html

[5] https://ci.apache.org/projects/flink/flink-docs-stable/deployment/config.html#taskmanager-numberoftaskslots

On Fri, Feb 5, 2021 at 12:22 PM Marco Villalobos <[hidden email]> wrote:

Okay, I am following up to my question. I see information regarding the threading and distribution model on the documentation about the architecture.

https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/flink-architecture.html

Next, I want to read up on what I have control over.

On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos <[hidden email]> wrote:
as data flows from a source through a pipeline of operators and finally sinks, is there a means to control how many threads are used within an operator, and how an operator is distributed across the network?

Where can I read up on these types of details specifically?