Configuring task slots and parallelism for single node Maven executed

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Configuring task slots and parallelism for single node Maven executed

Prez Cannady-3

Some background.

I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.

Couple of questions.

  1. I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct?
  2. Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
  3. I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?

Any feedback would be appreciated.


Prez Cannady  








Reply | Threaded
Open this post in threaded view
|

Re: Configuring task slots and parallelism for single node Maven executed

Balaji Rajagopalan
Answered based on my understanding. 

On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <[hidden email]> wrote:

Some background.

I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.

Couple of questions.

  1. I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct? You will running with single instance of task manager say if you are running in 4 core machine, you can set the parallelism = 4 
  1. Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
  I have been using GlobalConfiguration with no issues, but here is one thing you have to aware of, in clustered environment, you will have to copy over the yaml file in all the nodes, for example I read the file from /usr/share/flink/conf and I have sure this file is available in master node and task nodes as well.  Why do you want to injest the config from maven tool, you can do this main routine in our application code.  
  1. I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?

Any feedback would be appreciated.


Prez Cannady  









Reply | Threaded
Open this post in threaded view
|

Re: Configuring task slots and parallelism for single node Maven executed

Till Rohrmann

Hi Prez,

  1. the configuration setting taskmanager.numberOfTaskSlots says with how many task slots a TaskManager will be started. As a rough rule of thumb, set this value to the number of cores of the machine the TM is running on. This this link [1] for further information. The configuration value parallelism.default is the default parallelism with which a program will be executed if the user didn’t specify it via the submission tool or from within the program.

  2. You can configure the parallelism programmatically by calling setParallelism on the ExecutionEnvironment. The GlobalConfiguration approach won’t work in a distributed setting.

  3. see 1.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources

Cheers,
Till


On Mon, Apr 18, 2016 at 6:55 AM, Balaji Rajagopalan <[hidden email]> wrote:
Answered based on my understanding. 

On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <[hidden email]> wrote:

Some background.

I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.

Couple of questions.

  1. I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct? You will running with single instance of task manager say if you are running in 4 core machine, you can set the parallelism = 4 
  1. Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
  I have been using GlobalConfiguration with no issues, but here is one thing you have to aware of, in clustered environment, you will have to copy over the yaml file in all the nodes, for example I read the file from /usr/share/flink/conf and I have sure this file is available in master node and task nodes as well.  Why do you want to injest the config from maven tool, you can do this main routine in our application code.  
  1. I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?

Any feedback would be appreciated.


Prez Cannady  
p: <a href="tel:617%20500%203378" value="+16175003378" target="_blank">617 500 3378  











Reply | Threaded
Open this post in threaded view
|

Re: Configuring task slots and parallelism for single node Maven executed

Prez Cannady-3
Thank you both.  Will let you guys know how it works out.

Prez Cannady  








On Apr 18, 2016, at 3:48 AM, Till Rohrmann <[hidden email]> wrote:

Hi Prez,

  1. the configuration setting taskmanager.numberOfTaskSlots says with how many task slots a TaskManager will be started. As a rough rule of thumb, set this value to the number of cores of the machine the TM is running on. This this link [1] for further information. The configuration value parallelism.default is the default parallelism with which a program will be executed if the user didn’t specify it via the submission tool or from within the program.
  2. You can configure the parallelism programmatically by calling setParallelism on the ExecutionEnvironment. The GlobalConfiguration approach won’t work in a distributed setting.
  3. see 1.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#workers-slots-resources

Cheers,
Till


On Mon, Apr 18, 2016 at 6:55 AM, Balaji Rajagopalan <[hidden email]> wrote:
Answered based on my understanding. 

On Mon, Apr 18, 2016 at 8:12 AM, Prez Cannady <[hidden email]> wrote:

Some background.

I’m running Flink application on a single machine, instrumented by Spring Boot and launched via the Maven Spring Boot plugin. Basically, I’m trying to figure out how much I can squeeze out of a single node processing my task before committing to a cluster solution.

Couple of questions.

  1. I assume the configuration options taskmanager.numberOfTaskSlots and parallelism.default pertain to division of work on a single node. Am I correct? You will running with single instance of task manager say if you are running in 4 core machine, you can set the parallelism = 4 
  1. Is there a way to configure these options programmatically instead of the configuration YAML? Or some Maven tooling that can ingest a properly formatted Flink config? For the record, I’m currently trying GlobalConfigeration.getConfiguration.setInteger(“<config option name>”,<config option value>). I am also going to try supplying them as properties in the pom. I’m preparing some tests to see if either of these do as I expect, but thought I’d ask in case I’m heading down a rabbit hole.
  I have been using GlobalConfiguration with no issues, but here is one thing you have to aware of, in clustered environment, you will have to copy over the yaml file in all the nodes, for example I read the file from /usr/share/flink/conf and I have sure this file is available in master node and task nodes as well.  Why do you want to injest the config from maven tool, you can do this main routine in our application code.  
  1. I figure task slots is limited to the number of processors/cores/whatever available (and the JVM can get at). Is this accurate?

Any feedback would be appreciated.


Prez Cannady  
p: <a href="tel:617%20500%203378" value="+16175003378" target="_blank" class="">617 500 3378