Hi, huI am not sure why do you need to start multiple jobmanagers on kubernetes. Just as the manual [1], we use a deployment of 1 to make sure kubernetes detect the crash of jobmanager and start a new one. What we should do is to add the high availability configurations [2] in flink-conf.yaml. You could use the configMap [3] to save your flink-conf.yaml and then mount into to jobmanager pod. Also you could update the flink-conf.yaml in your flink image.胡逸才 <[hidden email]> 于2019年6月28日周五 上午11:09写道:HI Tan:I have the same problem with you when running "flink-1.7.2 ON KUBERNATE HA" mode, may I ask if you have solved this problem? How? After I started the two jobmanagers normally, when I tried to kill one of them, he could not restart normally. Both jobmanagers reported this error. The specific log is as follows:2019-06-28 09:57:57.253 [flink-akka.actor.default-dispatcher-4] WARN akka.remote.transport.netty.NettyTransport New I/O boss #3 - Remote connection to [null] failed with java.net.ConnectException: Connection refused: tdh2/192.168.208.55:565292019-06-28 09:57:57.253 [flink-akka.actor.default-dispatcher-4] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-14 - Association with remote system [akka.tcp://flink@tdh2:56529] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@tdh2:56529]] Caused by: [Connection refused: tdh2/192.168.208.55:56529]2019-06-28 09:57:57.253 [flink-akka.actor.default-dispatcher-4] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-14 - Association with remote system [akka.tcp://flink@tdh2:56529] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@tdh2:56529]] Caused by: [Connection refused: tdh2/192.168.208.55:56529]2019-06-28 09:57:57.260 [flink-rest-server-netty-worker-thread-7] ERROR o.a.f.r.rest.handler.legacy.files.StaticFileServerHandler - Could not retrieve the redirect address.java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink@tdh2:56529/user/dispatcher#299521377]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteFencedMessage".at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)at akka.dispatch.OnComplete.internal(Future.scala:258)at akka.dispatch.OnComplete.internal(Future.scala:256)at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)at java.lang.Thread.run(Thread.java:748)Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink@tdh2:56529/user/dispatcher#299521377]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteFencedMessage".at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)... 9 common frames omitted
Free forum by Nabble | Edit this page |