Hi all, I've got a job that I am trying to run using docker as per [1]. Here's the dockerfile: # Start from base Flink image. And the script I use to launch it: #!/usr/bin/env bash When I run the script, I see my job stuck in "CREATED" state and after some time I get the error: 2021-01-15 10:44:29,563 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}] and profile ResourceProfile{UNKNOWN} from resource manager. 2021-01-15 10:44:29,565 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job b854f75d6029e1725e822721c30095d7 with allocation id edc1e29d229aceb82f75b7c5835eca3c. 2021-01-15 10:46:39,604 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Failing pending slot request [SlotRequestId{1c25a61e6179f66b112b1944740f1a11}]: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable. 2021-01-15 10:46:39,667 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: advanced features kafak consumer (1/1) (49ea271f6b9881d82c49b2826e8584d9) switched from SCHEDULED to FAILED on not deployed. java.util.concurrent.CompletionException: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request edc1e29d229aceb82f75b7c5835eca3c. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable. ... I understand that the resourcemanager fails to provide resources for my job(?), but other than that the error is quite cryptic for me. Could anyone help me understand what is going wrong? Regards, Manas |
Where are you starting the task
executor?
On 1/15/2021 11:57 AM, Manas Kale
wrote:
|
You mean taskmanager? I tried using this command: docker run --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" flink_pipeline taskmanager after running above script but got: 2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying to select the network interface and address to use by connecting to the leading JobManager. 2021-01-15 13:03:05,069 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager will try to connect for PT10S before falling back to heuristics 2021-01-15 13:03:05,484 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Trying to connect to address jobmanager:6123 2021-01-15 13:03:05,486 INFO org.apache.flink.runtime.net.ConnectionUtils [] - Failed to connect from address '608ecee74cff/172.17.0.3': jobmanager Here's what I understand is supposed to happen: 1. Start a jobmanager in a docker container. 2. Start a taskmanager in another docker container and tell it where to find the jobmanager. 3. Using the taskmanager, submit a new job. I thought since step (1) is failing, adding the next step (starting taskmanager) would be of no use. Please correct me if my understanding is wrong. On Fri, Jan 15, 2021 at 4:37 PM Chesnay Schepler <[hidden email]> wrote:
|
The standalone-job process fails
because no task executors are around to request slots from.
It _should_ wait for a bit to give task
executors time to start up, controlled via resourcemanager.standalone.start-up-time
or, if unset, slot.request.timeout.
Does the standalone-job process fail
immediately?
On 1/15/2021 2:28 PM, Manas Kale wrote:
|
Free forum by Nabble | Edit this page |