(DEPRECATED) Apache Flink User Mailing List archive.

Flink Docker job fails to launch

Classic

List

Threaded

4 messages Options

Manas Kale

Flink Docker job fails to launch

Hi all,

I've got a job that I am trying to run using docker as per [1].

Here's the dockerfile:

# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081

And the script I use to launch it:

#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob

When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: advanced features kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
...

I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?

Regards,

Manas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/docker.html#introduction

Chesnay Schepler

Re: Flink Docker job fails to launch

Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:

Hi all,
I've got a job that I am trying to run using docker as per [1].

Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081
And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob
When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: advanced features kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
...

I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?

Regards,

Manas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/docker.html#introduction

Manas Kale

Re: Flink Docker job fails to launch

You mean taskmanager? I tried using this command:

docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_pipeline taskmanager

after running above script but got:

2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-01-15 13:03:05,484 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Trying to connect to address jobmanager:6123
2021-01-15 13:03:05,486 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '608ecee74cff/172.17.0.3': jobmanager

Here's what I understand is supposed to happen:

1. Start a jobmanager in a docker container.

2. Start a taskmanager in another docker container and tell it where to find the jobmanager.

3. Using the taskmanager, submit a new job.

I thought since step (1) is failing, adding the next step (starting taskmanager) would be of no use.

Please correct me if my understanding is wrong.

On Fri, Jan 15, 2021 at 4:37 PM Chesnay Schepler <[hidden email]> wrote:

Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:
Hi all,
I've got a job that I am trying to run using docker as per [1].

Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081
And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob
When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: advanced features kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
...

I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?

Regards,

Manas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/docker.html#introduction

Chesnay Schepler

Re: Flink Docker job fails to launch

The standalone-job process fails because no task executors are around to request slots from.

It _should_ wait for a bit to give task executors time to start up, controlled via resourcemanager.standalone.start-up-time or, if unset, slot.request.timeout.

Does the standalone-job process fail immediately?

On 1/15/2021 2:28 PM, Manas Kale wrote:

You mean taskmanager? I tried using this command:

docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_pipeline taskmanager

after running above script but got:

2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-01-15 13:03:05,484 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Trying to connect to address jobmanager:6123
2021-01-15 13:03:05,486 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '608ecee74cff/172.17.0.3': jobmanager

Here's what I understand is supposed to happen:

1. Start a jobmanager in a docker container.

2. Start a taskmanager in another docker container and tell it where to find the jobmanager.

3. Using the taskmanager, submit a new job.

I thought since step (1) is failing, adding the next step (starting taskmanager) would be of no use.

Please correct me if my understanding is wrong.
On Fri, Jan 15, 2021 at 4:37 PM Chesnay Schepler <[hidden email]> wrote:
Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:
Hi all,
I've got a job that I am trying to run using docker as per [1].

Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081
And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob
When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: advanced features kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
...

I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?

Regards,

Manas

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/standalone/docker.html#introduction