Hi,
I am trying to run a cluster of job-manager and task-manager in docker. One of each for now. I got a StandaloneResourceManager error, stating that it can not associate with job-manager. I do not know what was wrong. I am sure that job-manager can be connected. =============================== root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929 Trying 172.18.0.3... Connected to flink-jobmanager. Escape character is '^]'. Connection closed by foreign host. =============================== Here is my config: =============================== Starting Job Manager config file: jobmanager.rpc.address: flink_jobmanager jobmanager.rpc.port: 6123 jobmanager.web.port: 8081 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 1 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/ historyserver.archive.fs.dir: file:///flink_data/completed-jobs/ state.backend: rocksdb state.backend.fs.checkpointdir: file:///flink_data/checkpoints taskmanager.tmp.dirs: /flink_data/tmp blob.storage.directory: /flink_data/tmp jobmanager.web.tmpdir: /flink_data/tmp env.log.dir: /flink_data/logs high-availability: zookeeper high-availability.storageDir: file:///flink_data/ha/ high-availability.zookeeper.quorum: kafka:2181 blob.server.port: 6124 query.server.port: 6125 =============================== Here is the major error I see: =============================== 2017-08-16 02:46:23,586 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-08-16 02:46:23,612 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was granted leadership with leader session ID Some(06abc8f5-c1b9-44b2-bb7f-771c74981552). 2017-08-16 02:46:23,627 INFO org.apache.flink.runtime.jobmanager.JobManager - Delaying recovery of all jobs by 10000 milliseconds. 2017-08-16 02:46:23,638 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@flink_jobmanager:32929/user/jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552. 2017-08-16 02:46:23,640 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Trying to associate with JobManager leader akka.tcp://flink@flink_jobmanager:32929/user/jobmanager 2017-08-16 02:46:23,653 WARN org.apache.flink.runtime.webmonitor.JobManagerRetriever - Failed to retrieve leader gateway and port. akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280) at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270) at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63) at org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498) at org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala) at org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141) at org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.nodeChanged(ZooKeeperLeaderRetrievalService.java:168) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304) at org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) at org.apache.flink.shaded.org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122) at org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749) at org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522) at org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 2017-08-16 02:46:33,644 INFO org.apache.flink.runtime.jobmanager.JobManager - Attempting to recover all jobs. 2017-08-16 02:46:33,648 INFO org.apache.flink.runtime.jobmanager.JobManager - There are no jobs to recover. =============================== More detailed log: https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d |
Hi Hao Sun, have you checked that one can resolve the hostname flink_jobmanager from within the container? This is required to connect to the JobManager. If this is the case, then log files with DEBUG log level would be helpful to track down the problem. Cheers, Till On Wed, Aug 16, 2017 at 5:35 AM, Hao Sun <[hidden email]> wrote:
|
Thanks Till, the DEBUG log level is a good idea. I figured it out. I made a mistake with `-` and `_`. On Tue, Aug 22, 2017 at 1:39 AM Till Rohrmann <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |