Hello I'm using Flink 1.5.0 version and Flink CLI to submit jar to flink cluster. |
In 1.5 we reworked the job-submission
to go through the REST API instead of akka.
I believe the jobmanager rpc port shouldn't be necessary anymore, the rpc address is still required due to some technical implementations; it may be that you can set this to some arbitrary value however. As a result the REST API (i.e. the web server) must be running in order to submit jobs. On 19.06.2018 14:12, Sampath Bhat wrote:
|
Hi Chesnay If REST API (i.e. the web server) is mandatory for submitting jobs then why is there an option to set rest.port to -1? I think it should be mandatory to set some valid port for rest.port and make sure flink job manager does not come up if valid port is not set for rest.port? Or else there must be some way to submit jobs even if REST API (i.e. the web server) is not instantiated.On Tue, Jun 19, 2018 at 5:49 PM, Chesnay Schepler <[hidden email]> wrote:
|
Hi Chesnay Adding on to this point you made - " the rpc address is still required due to some technical
implementations; it may be that you can set this to some arbitrary
value however."org.apache.flink.client.deployment.ClusterRetrieveException: Couldn't retrieve standalone cluster at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:51) at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:31) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:249) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020) at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096) Caused by: java.net.UnknownHostException: flinktest-flink-jobmanager1233445: Name or service not known (Random name flinktest-flink-jobmanager1233445) at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at java.net.InetAddress.getByName(InetAddress.java:1076) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:171) at org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:136) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:83) at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:158) at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:184) at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:157) at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:49) ... 7 more On Wed, Jun 20, 2018 at 11:18 AM, Sampath Bhat <[hidden email]> wrote:
|
I was worried this might be the case.
The rest.port handling was simply copied from the legacy web-server, which explicitly allowed shutting it down. It may (I'm not entirely sure) also not be necessary for all deployment modes; for example if the job is baked into the job/taskmanager images. I'm not quite sure whether the rpc address is actually required for the REST job submission, or only since we still rely partly on some legacy code (ClusterClient). Maybe Till (cc) knows the answer to that. > Adding on to this point you made - " the rpc address is still *required *due > to some technical implementations; it may be that you can set this to some > arbitrary value however." > > For job submission to happen successfully we should give specific rpc > address and not any arbitrary value. If any arbitrary value is given the > job submission fails with the following error - > org.apache.flink.client.deployment.ClusterRetrieveException: Couldn't > retrieve standalone cluster > at > org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:51) > at > org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:31) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:249) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096) > Caused by: java.net.UnknownHostException: flinktest-flink-jobmanager1233445: > Name or service not known > (Random name flinktest-flink-jobmanager1233445) > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) > at java.net.InetAddress.getAllByName0(InetAddress.java:1276) > at java.net.InetAddress.getAllByName(InetAddress.java:1192) > at java.net.InetAddress.getAllByName(InetAddress.java:1126) > at java.net.InetAddress.getByName(InetAddress.java:1076) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:171) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:136) > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:83) > at > org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:158) > at > org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:184) > at > org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:157) > at > org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:49) > ... 7 more > > > On Wed, Jun 20, 2018 at 11:18 AM, Sampath Bhat <[hidden email]> > wrote: > >> Hi Chesnay >> >> If REST API (i.e. the web server) is mandatory for submitting jobs then >> why is there an option to set rest.port to -1? I think it should be >> mandatory to set some valid port for rest.port and make sure flink job >> manager does not come up if valid port is not set for rest.port? Or else >> there must be some way to submit jobs even if REST API (i.e. the web >> server) is not instantiated. >> >> If jobmanger.rpc.address is not required for flink client then why is it >> still looking for that property in flink-conf.yaml? Isn't it not a bug? >> Because if we comment out the jobmanger.rpc.address and jobmanger.rpc.port >> then flink client will not be able to submit the job. >> >> >> On Tue, Jun 19, 2018 at 5:49 PM, Chesnay Schepler <[hidden email]> >> wrote: >> >>> In 1.5 we reworked the job-submission to go through the REST API instead >>> of akka. >>> >>> I believe the jobmanager rpc port shouldn't be necessary anymore, the rpc >>> address is still *required *due to some technical implementations; it >>> may be that you can set this to some arbitrary value however. >>> >>> As a result the REST API (i.e. the web server) must be running in order >>> to submit jobs. >>> >>> >>> On 19.06.2018 14:12, Sampath Bhat wrote: >>> >>> Hello >>> >>> I'm using Flink 1.5.0 version and Flink CLI to submit jar to flink >>> cluster. >>> >>> In flink 1.4.2 only job manager rpc address and job manager rpc port were >>> sufficient for flink client to connect to job manager and submit the job. >>> >>> But in flink 1.5.0 the flink client additionally requires the >>> rest.address and rest.port for submitting the job to job manager. What is >>> the advantage of this new method over the 1.4.2 method of submitting job? >>> >>> Moreover if we make rest.port = -1 the web server will not be >>> instantiated then how should we submit the job? >>> >>> Regards >>> Sampath >>> >>> >>> |
Hi Sampath, it is no longer possible to not start the rest server endpoint by setting rest.port to -1. If you do this, then the cluster won't start. The comment in the flink-conf.yaml holds only true for the legacy mode. In non-HA setups we need the jobmanager.rpc.address to derive the hostname of the rest server. The jobmanager.rpc.port is no longer needed for the client but only for the other cluster components (TMs). When using the HA mode, then every address will be retrieved from ZooKeeper. I hope this clarifies things. Cheers, Till On Wed, Jun 20, 2018 at 9:24 AM Chesnay Schepler <[hidden email]> wrote: I was worried this might be the case. |
Shouldn't the non-HA case be covered by
rest.address?
On 20.06.2018 09:40, Till Rohrmann wrote:
|
It will, but it defaults to jobmanager.rpc.address if no rest.address has been specified. On Wed, Jun 20, 2018 at 9:49 AM Chesnay Schepler <[hidden email]> wrote:
|
Hello Till Thanks for clarification. But I've few questions based on your reply. So if rest.address is not provided in flink-conf.yaml then looking for jobmanager.rpc.address for deriving the hostname of rest server makes sense, but when the user has already provided the rest.address but flink still looks into jobmanager.rpc.address for getting hostname of rest server is an unwanted dependency IMO. On Wed, Jun 20, 2018 at 1:25 PM, Till Rohrmann <[hidden email]> wrote:
|
Hi, if the rest.address is different from the jobmanager.rpc.address, then you should specify that in the flink-conf.yaml and Flink will connect to rest.address. Only if rest.address is not specified, the system will fall back to use the jobmanager.rpc.address. Currently, the rest server endpoint runs in the same JVM as the cluster entrypoint and all JobMasters. Cheers, Till On Thu, Jun 21, 2018 at 8:46 AM Sampath Bhat <[hidden email]> wrote: Hello Till |
hi Yes I've specified the rest.address for the flink client to connect to the rest.address and the rest.address is valid and working fine but my question is why am I supposed to give jobmanager.rpc.address for flink client to connect to flink cluster if flink client depends only on rest.address?On Thu, Jun 21, 2018 at 12:41 PM, Till Rohrmann <[hidden email]> wrote:
|
The reason why you still have to do it is because we still have to support the legacy mode where the client needs to know the JobManager RPC address. Once we remove the legacy mode, we could change the HighAvailabilityServices such that we have client facing HA services which only retrieve the rest server endpoint and cluster internal HA services which need to know the cluster components address at cluster startup. Cheers, Till On Thu, Jun 21, 2018 at 11:38 AM Sampath Bhat <[hidden email]> wrote:
|
In reply to this post by Till Rohrmann
Hi Till:
So how can we get the right rest address and port when using HA mode on Yarn? I get it from the rest api "/jars ". But when I submit a job use the flink run -m ,it failed . org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result. at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035) at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111) Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:203) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:795) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException ... 8 more -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |