Flink Docker job fails to launch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Docker job fails to launch

Manas Kale
Hi all,
I've got a job that I am trying to run using docker as per [1].
Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081

And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
taskmanager.memory.process.size: 4000
jobmanager.memory.process.size: 4000
"

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob

When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: advanced features  kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
      ...
I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?


Regards,
Manas

Reply | Threaded
Open this post in threaded view
|

Re: Flink Docker job fails to launch

Chesnay Schepler
Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:
Hi all,
I've got a job that I am trying to run using docker as per [1].
Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081

And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob

When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: advanced features  kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
      ...
I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?


Regards,
Manas


Reply | Threaded
Open this post in threaded view
|

Re: Flink Docker job fails to launch

Manas Kale
You mean taskmanager? I tried using this command:

docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_pipeline taskmanager

after running above script but got:

2021-01-15 13:03:05,069 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-01-15 13:03:05,069 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-01-15 13:03:05,484 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Trying to connect to address jobmanager:6123
2021-01-15 13:03:05,486 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '608ecee74cff/172.17.0.3': jobmanager

Here's what I understand is supposed to happen:
1. Start a jobmanager in a docker container.
2. Start a taskmanager in another docker container and tell it where to find the jobmanager.
3. Using the taskmanager, submit a new job.

I thought since step (1) is failing, adding the next step (starting taskmanager) would be of no use.

Please correct me if my understanding is wrong.




On Fri, Jan 15, 2021 at 4:37 PM Chesnay Schepler <[hidden email]> wrote:
Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:
Hi all,
I've got a job that I am trying to run using docker as per [1].
Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081

And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob

When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: advanced features  kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
      ...
I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?


Regards,
Manas


Reply | Threaded
Open this post in threaded view
|

Re: Flink Docker job fails to launch

Chesnay Schepler
The standalone-job process fails because no task executors are around to request slots from.
It _should_ wait for a bit to give task executors time to start up, controlled via resourcemanager.standalone.start-up-time or, if unset, slot.request.timeout.
Does the standalone-job process fail immediately?

On 1/15/2021 2:28 PM, Manas Kale wrote:
You mean taskmanager? I tried using this command:

docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_pipeline taskmanager

after running above script but got:

2021-01-15 13:03:05,069 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-01-15 13:03:05,069 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-01-15 13:03:05,484 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Trying to connect to address jobmanager:6123
2021-01-15 13:03:05,486 INFO  org.apache.flink.runtime.net.ConnectionUtils                 [] - Failed to connect from address '608ecee74cff/172.17.0.3': jobmanager

Here's what I understand is supposed to happen:
1. Start a jobmanager in a docker container.
2. Start a taskmanager in another docker container and tell it where to find the jobmanager.
3. Using the taskmanager, submit a new job.

I thought since step (1) is failing, adding the next step (starting taskmanager) would be of no use.

Please correct me if my understanding is wrong.




On Fri, Jan 15, 2021 at 4:37 PM Chesnay Schepler <[hidden email]> wrote:
Where are you starting the task executor?

On 1/15/2021 11:57 AM, Manas Kale wrote:
Hi all,
I've got a job that I am trying to run using docker as per [1].
Here's the dockerfile:
# Start from base Flink image.
FROM flink:1.11.0

# Add fat JAR and logger properties file to image.
ADD ./target/flink_POC-0.1.jar /opt/flink/usrlib/flink_POC-0.1.jar
ADD ./target/classes/log4j.properties /opt/flink/usrlib/log4j.properties

# Add pipeline.properties and its location.
ADD target/classes/pipeline.properties /opt/flink/usrlib/pipeline.properties
ENV FLINK_CONFIG_LOCATION=/opt/flink/usrlib/pipeline.properties


EXPOSE 8081

And the script I use to launch it:
#!/usr/bin/env bash

echo "Building docker image..."
docker build --tag flink_pipeline .

echo "Configuring Flink runtime..."
export FLINK_PROPERTIES="jobmanager.rpc.address: host
 taskmanager.memory.process.size: 4000
 jobmanager.memory.process.size: 4000
 "

echo "Starting docker image..."
docker run --rm -p 8081:8081 --env FLINK_PROPERTIES=FLINK_PROPERTIES \
flink_pipeline standalone-job --job-classname flink_POC.StreamingJob

When I run the script, I see my job stuck in "CREATED" state and after some time I get the error:

2021-01-15 10:44:29,563 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager.
2021-01-15 10:44:29,565 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c.
2021-01-15 10:46:39,604 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
2021-01-15 10:46:39,667 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: advanced features  kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed.
java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
      ...
I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong?


Regards,
Manas