After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Alex Drobinsky
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
ldconfig && \
ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
mkdir /opt/flink/checkpoints/ && \
mkdir /opt/flink/ip-collection/incorrectIcs && \
mkdir /opt/flink/ip-collection/storage && \
mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander


Reply | Threaded
Open this post in threaded view
|

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Chesnay Schepler
I believe this is due to FLINK-21037; we did not consider the possibility of users mounting the configuration directly, and instead assumed that modifications to the config always go through the FLINK_PROPERTIES environment variable.

That would also be the workaround for your issue.

On 5/12/2021 2:06 PM, Alex Drobinsky wrote:
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
    ldconfig && \
    ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
    mkdir /opt/flink/checkpoints/ && \
    mkdir /opt/flink/ip-collection/incorrectIcs && \
    mkdir /opt/flink/ip-collection/storage && \
    mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander



Reply | Threaded
Open this post in threaded view
|

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Chesnay Schepler
You could also configure the number of slots via the TASK_MANAGER_NUMBER_OF_TASK_SLOTS environment variable.

On 5/12/2021 2:19 PM, Chesnay Schepler wrote:
I believe this is due to FLINK-21037; we did not consider the possibility of users mounting the configuration directly, and instead assumed that modifications to the config always go through the FLINK_PROPERTIES environment variable.

That would also be the workaround for your issue.

On 5/12/2021 2:06 PM, Alex Drobinsky wrote:
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
    ldconfig && \
    ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
    mkdir /opt/flink/checkpoints/ && \
    mkdir /opt/flink/ip-collection/incorrectIcs && \
    mkdir /opt/flink/ip-collection/storage && \
    mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander




Reply | Threaded
Open this post in threaded view
|

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Alex Drobinsky
Thanks a lot !
I used TASK_MANAGER_NUMBER_OF_TASK_SLOTS in my docker-compose.yml, it works perfectly :)
In which format I could provide parameters via FLINK_PROPERTIES ? I'm thinking of abandoning the idea to copy flink-conf in Dockerfile.
Is it limited to a specific set of parameters or generic ?

ср, 12 мая 2021 г. в 15:20, Chesnay Schepler <[hidden email]>:
You could also configure the number of slots via the TASK_MANAGER_NUMBER_OF_TASK_SLOTS environment variable.

On 5/12/2021 2:19 PM, Chesnay Schepler wrote:
I believe this is due to FLINK-21037; we did not consider the possibility of users mounting the configuration directly, and instead assumed that modifications to the config always go through the FLINK_PROPERTIES environment variable.

That would also be the workaround for your issue.

On 5/12/2021 2:06 PM, Alex Drobinsky wrote:
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
    ldconfig && \
    ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
    mkdir /opt/flink/checkpoints/ && \
    mkdir /opt/flink/ip-collection/incorrectIcs && \
    mkdir /opt/flink/ip-collection/storage && \
    mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander




Reply | Threaded
Open this post in threaded view
|

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Chesnay Schepler

The contents of FLINK_PROPERTIES are piped as-is into the flink configuration, and thus require the same format as the configuration.


On 5/12/2021 2:36 PM, Alex Drobinsky wrote:
Thanks a lot !
I used TASK_MANAGER_NUMBER_OF_TASK_SLOTS in my docker-compose.yml, it works perfectly :)
In which format I could provide parameters via FLINK_PROPERTIES ? I'm thinking of abandoning the idea to copy flink-conf in Dockerfile.
Is it limited to a specific set of parameters or generic ?

ср, 12 мая 2021 г. в 15:20, Chesnay Schepler <[hidden email]>:
You could also configure the number of slots via the TASK_MANAGER_NUMBER_OF_TASK_SLOTS environment variable.

On 5/12/2021 2:19 PM, Chesnay Schepler wrote:
I believe this is due to FLINK-21037; we did not consider the possibility of users mounting the configuration directly, and instead assumed that modifications to the config always go through the FLINK_PROPERTIES environment variable.

That would also be the workaround for your issue.

On 5/12/2021 2:06 PM, Alex Drobinsky wrote:
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
    ldconfig && \
    ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
    mkdir /opt/flink/checkpoints/ && \
    mkdir /opt/flink/ip-collection/incorrectIcs && \
    mkdir /opt/flink/ip-collection/storage && \
    mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander





Reply | Threaded
Open this post in threaded view
|

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

Alexey Trenikhun
If flink-conf.yaml is readonly, flink will complain but work fine?


From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, May 12, 2021 5:38 AM
To: Alex Drobinsky <[hidden email]>
Cc: [hidden email] <[hidden email]>
Subject: Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.
 

The contents of FLINK_PROPERTIES are piped as-is into the flink configuration, and thus require the same format as the configuration.


On 5/12/2021 2:36 PM, Alex Drobinsky wrote:
Thanks a lot !
I used TASK_MANAGER_NUMBER_OF_TASK_SLOTS in my docker-compose.yml, it works perfectly :)
In which format I could provide parameters via FLINK_PROPERTIES ? I'm thinking of abandoning the idea to copy flink-conf in Dockerfile.
Is it limited to a specific set of parameters or generic ?

ср, 12 мая 2021 г. в 15:20, Chesnay Schepler <[hidden email]>:
You could also configure the number of slots via the TASK_MANAGER_NUMBER_OF_TASK_SLOTS environment variable.

On 5/12/2021 2:19 PM, Chesnay Schepler wrote:
I believe this is due to FLINK-21037; we did not consider the possibility of users mounting the configuration directly, and instead assumed that modifications to the config always go through the FLINK_PROPERTIES environment variable.

That would also be the workaround for your issue.

On 5/12/2021 2:06 PM, Alex Drobinsky wrote:
Dear flink community,

First I need provide some minimum information about my deployment scenario:
I'm running application inside of Flink docker, below original Dockerfile:
-----------------------------------------------------------------------------------------------------------
FROM flink:1.13.0-scala_2.11-java11

# Copy log and monitoring related JARs to flink lib dir
COPY kafka-clients-2.4.1.jar /opt/flink/lib/

RUN chmod 777 /tmp
RUN apt-get update && apt-get install -y htop

# configuration files
COPY Log4cpp.properties /opt/flink/
COPY Log4j.properties /opt/flink/conf/log4j.properties
COPY SessionOrganizer.json /opt/flink/
COPY flink-conf.yaml /opt/flink/conf/
COPY slaves /opt/flink/conf/

# job file
COPY KafkaToSessions-shade.jar /opt/flink/lib/

# libraries
ADD libs /usr/local/lib/

# Add /usr/local/lib to ldconfig
RUN echo "/usr/local/lib/" > /etc/ld.so.conf.d/ips.conf && \
    ldconfig && \
    ulimit -c 0

RUN mkdir /opt/flink/ip-collection/ && \
    mkdir /opt/flink/checkpoints/ && \
    mkdir /opt/flink/ip-collection/incorrectIcs && \
    mkdir /opt/flink/ip-collection/storage && \
    mkdir /opt/flink/ip-collection/logs

CMD /opt/flink/bin/start-cluster.sh && /opt/flink/bin/flink run /opt/flink/lib/KafkaToSessions-shade.jar
-------------------------------------------------------------------------------------------------------------------------------
If we will ignore irrelevant parts of Dockerfile, the only 2 things remains ( beside FROM statement)
1. Overwritten flink-conf.yml + slaves
2. CMD which executes start-cluster and job.

My flink-conf.yml:
---------------------------------------------------------------------------------------------------------
rest.address: localhost
rest.port: 8088
state.backend: filesystem
state.checkpoints.dir: file:///opt/flink/checkpoints
jobmanager.memory.process.size: 2224m
jobmanager.rpc.port: 6123
jobmanager.rpc.address: localhost
taskmanager.memory.flink.size: 2224m
taskmanager.memory.task.heap.size: 1000m
taskmanager.numberOfTaskSlots: 12
taskmanager.rpc.port: 50100
taskmanager.data.port: 50000
parallelism.default: 6
heartbeat.timeout: 120000
heartbeat.interval: 20000
env.java.opts: "-XX:+UseG1GC -XX:MaxGCPauseMillis=300"
---------------------------------------------------------------------------------------------------------
Slaves file contain single line with localhost.
After start of docker, I noticed that application doesn't work due lack of slots. When I checked flink-conf.yml I noticed that taskmanager.numberOfTaskSlots is set to 1.
P.S. during first time, daemon.sh complained that it doesn't have write permissions to change flink-conf.yml, when I added chown flink.flink /opt/flink/flink-conf.yml -
it stopped to complain & taskmanager.numberOfTaskSlots change occured.

Any suggestions ?

Best regards, 
Alexander