Submitting job to Flink on yarn timesout on flip-6 1.5.x

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Submitting job to Flink on yarn timesout on flip-6 1.5.x

Richard Deurwaarder
Hello,

I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job.

The flink job runs on our hadoop cluster and starts using Yarn. 

Relevant config options seem to be:

jobmanager.rpc.port: 55501

recovery.jobmanager.port: 55502

yarn.application-master.port: 55503

blob.server.port: 55504

I've seen the following behavior:
  - Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
  - Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
  - Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)

When the timeout happens I get the following stacktrace:

INFOclass java.time.Instant does not contain a getter for field seconds2019-02-18T10:16:56.815+01:00
INFOclass com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId2019-02-18T10:16:56.815+01:00
INFOSubmitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).2019-02-18T10:16:57.182+01:00
INFO------------------------------------------------------------2019-02-18T10:29:27.527+01:00
INFOThe program finished with the following exception:2019-02-18T10:29:27.564+01:00
INFOorg.apache.flink.client.program.ProgramInvocationException: The main method caused an error.2019-02-18T10:29:27.601+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)2019-02-18T10:29:27.638+01:00
INFOat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)2019-02-18T10:29:27.675+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)2019-02-18T10:29:27.711+01:00
INFOat org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)2019-02-18T10:29:27.747+01:00
INFOat org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)2019-02-18T10:29:27.784+01:00
INFOat org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)2019-02-18T10:29:27.820+01:00
INFOat org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)2019-02-18T10:29:27.857+01:00
INFOat org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)2019-02-18T10:29:27.893+01:00
INFOat java.security.AccessController.doPrivileged(Native Method)2019-02-18T10:29:27.929+01:00
INFOat javax.security.auth.Subject.doAs(Subject.java:422)2019-02-18T10:29:27.968+01:00
INFOat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)2019-02-18T10:29:28.004+01:00
INFOat org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)2019-02-18T10:29:28.040+01:00
INFOat org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)2019-02-18T10:29:28.075+01:00
INFOCaused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.110+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)2019-02-18T10:29:28.146+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)2019-02-18T10:29:28.182+01:00
INFOat com.bol.fin_hdp.Main.main(Main.java:8)2019-02-18T10:29:28.217+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)2019-02-18T10:29:28.253+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)2019-02-18T10:29:28.289+01:00
INFOat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)2019-02-18T10:29:28.325+01:00
INFOat java.lang.reflect.Method.invoke(Method.java:498)2019-02-18T10:29:28.363+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)2019-02-18T10:29:28.400+01:00
INFO... 12 more2019-02-18T10:29:28.436+01:00
INFOCaused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.473+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)2019-02-18T10:29:28.509+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)2019-02-18T10:29:28.544+01:00
INFOat org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)2019-02-18T10:29:28.581+01:00
INFOat com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)2019-02-18T10:29:28.617+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)2019-02-18T10:29:28.654+01:00
INFO... 19 more2019-02-18T10:29:28.693+01:00
INFOCaused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.2019-02-18T10:29:28.730+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)2019-02-18T10:29:28.766+01:00
INFOat java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)2019-02-18T10:29:28.803+01:00
INFOat java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)2019-02-18T10:29:28.839+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:28.876+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:28.912+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)2019-02-18T10:29:28.948+01:00
INFOat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)2019-02-18T10:29:28.986+01:00
INFOat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)2019-02-18T10:29:29.023+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:29.060+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:29.096+01:00
INFOat org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)2019-02-18T10:29:29.133+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)2019-02-18T10:29:29.169+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)2019-02-18T10:29:29.206+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)2019-02-18T10:29:29.242+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)2019-02-18T10:29:29.278+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)2019-02-18T10:29:29.315+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)2019-02-18T10:29:29.352+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)2019-02-18T10:29:29.388+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)2019-02-18T10:29:29.424+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)2019-02-18T10:29:29.460+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)2019-02-18T10:29:29.496+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)2019-02-18T10:29:29.532+01:00
INFOat java.lang.Thread.run(Thread.java:748)2019-02-18T10:29:29.569+01:00
INFOCaused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.2019-02-18T10:29:29.606+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)2019-02-18T10:29:29.643+01:00
INFO... 17 more2019-02-18T10:29:29.680+01:00
INFOCaused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...2019-02-18T10:29:29.717+01:00
INFOat java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)2019-02-18T10:29:29.753+01:00
INFOat java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)2019-02-18T10:29:29.789+01:00
INFOat java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)2019-02-18T10:29:29.826+01:00
INFOat java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)2019-02-18T10:29:29.862+01:00
INFO... 15 more2019-02-18T10:29:29.898+01:00
INFOCaused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:465002019-02-18T10:29:29.934+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)2019-02-18T10:29:29.970+01:00
INFO... 7 more
Does anyone have tips how to debug this or what configuration changes I need to make?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

Gary Yao-4
Hi,

Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I am
currently working on removing references to the legacy mode in the
documentation [3]. Is there any reason, you cannot use the "new mode"?

Best,
Gary

[1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html
[2] https://issues.apache.org/jira/browse/FLINK-10392
[3] https://issues.apache.org/jira/browse/FLINK-11713

On Mon, Feb 18, 2019 at 12:00 PM Richard Deurwaarder <[hidden email]> wrote:
Hello,

I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job.

The flink job runs on our hadoop cluster and starts using Yarn. 

Relevant config options seem to be:

jobmanager.rpc.port: 55501

recovery.jobmanager.port: 55502

yarn.application-master.port: 55503

blob.server.port: 55504

I've seen the following behavior:
  - Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
  - Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
  - Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)

When the timeout happens I get the following stacktrace:

INFOclass java.time.Instant does not contain a getter for field seconds2019-02-18T10:16:56.815+01:00
INFOclass com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId2019-02-18T10:16:56.815+01:00
INFOSubmitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).2019-02-18T10:16:57.182+01:00
INFO------------------------------------------------------------2019-02-18T10:29:27.527+01:00
INFOThe program finished with the following exception:2019-02-18T10:29:27.564+01:00
INFOorg.apache.flink.client.program.ProgramInvocationException: The main method caused an error.2019-02-18T10:29:27.601+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)2019-02-18T10:29:27.638+01:00
INFOat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)2019-02-18T10:29:27.675+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)2019-02-18T10:29:27.711+01:00
INFOat org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)2019-02-18T10:29:27.747+01:00
INFOat org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)2019-02-18T10:29:27.784+01:00
INFOat org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)2019-02-18T10:29:27.820+01:00
INFOat org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)2019-02-18T10:29:27.857+01:00
INFOat org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)2019-02-18T10:29:27.893+01:00
INFOat java.security.AccessController.doPrivileged(Native Method)2019-02-18T10:29:27.929+01:00
INFOat javax.security.auth.Subject.doAs(Subject.java:422)2019-02-18T10:29:27.968+01:00
INFOat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)2019-02-18T10:29:28.004+01:00
INFOat org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)2019-02-18T10:29:28.040+01:00
INFOat org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)2019-02-18T10:29:28.075+01:00
INFOCaused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.110+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)2019-02-18T10:29:28.146+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)2019-02-18T10:29:28.182+01:00
INFOat com.bol.fin_hdp.Main.main(Main.java:8)2019-02-18T10:29:28.217+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)2019-02-18T10:29:28.253+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)2019-02-18T10:29:28.289+01:00
INFOat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)2019-02-18T10:29:28.325+01:00
INFOat java.lang.reflect.Method.invoke(Method.java:498)2019-02-18T10:29:28.363+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)2019-02-18T10:29:28.400+01:00
INFO... 12 more2019-02-18T10:29:28.436+01:00
INFOCaused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.473+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)2019-02-18T10:29:28.509+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)2019-02-18T10:29:28.544+01:00
INFOat org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)2019-02-18T10:29:28.581+01:00
INFOat com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)2019-02-18T10:29:28.617+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)2019-02-18T10:29:28.654+01:00
INFO... 19 more2019-02-18T10:29:28.693+01:00
INFOCaused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.2019-02-18T10:29:28.730+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)2019-02-18T10:29:28.766+01:00
INFOat java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)2019-02-18T10:29:28.803+01:00
INFOat java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)2019-02-18T10:29:28.839+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:28.876+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:28.912+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)2019-02-18T10:29:28.948+01:00
INFOat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)2019-02-18T10:29:28.986+01:00
INFOat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)2019-02-18T10:29:29.023+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:29.060+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:29.096+01:00
INFOat org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)2019-02-18T10:29:29.133+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)2019-02-18T10:29:29.169+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)2019-02-18T10:29:29.206+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)2019-02-18T10:29:29.242+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)2019-02-18T10:29:29.278+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)2019-02-18T10:29:29.315+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)2019-02-18T10:29:29.352+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)2019-02-18T10:29:29.388+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)2019-02-18T10:29:29.424+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)2019-02-18T10:29:29.460+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)2019-02-18T10:29:29.496+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)2019-02-18T10:29:29.532+01:00
INFOat java.lang.Thread.run(Thread.java:748)2019-02-18T10:29:29.569+01:00
INFOCaused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.2019-02-18T10:29:29.606+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)2019-02-18T10:29:29.643+01:00
INFO... 17 more2019-02-18T10:29:29.680+01:00
INFOCaused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...2019-02-18T10:29:29.717+01:00
INFOat java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)2019-02-18T10:29:29.753+01:00
INFOat java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)2019-02-18T10:29:29.789+01:00
INFOat java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)2019-02-18T10:29:29.826+01:00
INFOat java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)2019-02-18T10:29:29.862+01:00
INFO... 15 more2019-02-18T10:29:29.898+01:00
INFOCaused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:465002019-02-18T10:29:29.934+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)2019-02-18T10:29:29.970+01:00
INFO... 7 more
Does anyone have tips how to debug this or what configuration changes I need to make?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

Richard Deurwaarder
Hello Gary,

Thank you for your response. 

I'd like to use the new mode but it does not work for me. It seems I am running into a firewall issue.

Because the rest.port is random when running on yarn[1]. The machine I use to deploy the job can, in fact, start the Flink cluster, but it cannot submit the job on the random chosen port because our firewall blocks it.

Do you know if this is still the case on 1.7 and if there is any way to work around this?

Richard


On Thu, Feb 21, 2019 at 3:41 PM Gary Yao <[hidden email]> wrote:
Hi,

Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I am
currently working on removing references to the legacy mode in the
documentation [3]. Is there any reason, you cannot use the "new mode"?

Best,
Gary

[1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html
[2] https://issues.apache.org/jira/browse/FLINK-10392
[3] https://issues.apache.org/jira/browse/FLINK-11713

On Mon, Feb 18, 2019 at 12:00 PM Richard Deurwaarder <[hidden email]> wrote:
Hello,

I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job.

The flink job runs on our hadoop cluster and starts using Yarn. 

Relevant config options seem to be:

jobmanager.rpc.port: 55501

recovery.jobmanager.port: 55502

yarn.application-master.port: 55503

blob.server.port: 55504

I've seen the following behavior:
  - Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
  - Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
  - Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)

When the timeout happens I get the following stacktrace:

INFOclass java.time.Instant does not contain a getter for field seconds2019-02-18T10:16:56.815+01:00
INFOclass com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId2019-02-18T10:16:56.815+01:00
INFOSubmitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).2019-02-18T10:16:57.182+01:00
INFO------------------------------------------------------------2019-02-18T10:29:27.527+01:00
INFOThe program finished with the following exception:2019-02-18T10:29:27.564+01:00
INFOorg.apache.flink.client.program.ProgramInvocationException: The main method caused an error.2019-02-18T10:29:27.601+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)2019-02-18T10:29:27.638+01:00
INFOat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)2019-02-18T10:29:27.675+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)2019-02-18T10:29:27.711+01:00
INFOat org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)2019-02-18T10:29:27.747+01:00
INFOat org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)2019-02-18T10:29:27.784+01:00
INFOat org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)2019-02-18T10:29:27.820+01:00
INFOat org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)2019-02-18T10:29:27.857+01:00
INFOat org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)2019-02-18T10:29:27.893+01:00
INFOat java.security.AccessController.doPrivileged(Native Method)2019-02-18T10:29:27.929+01:00
INFOat javax.security.auth.Subject.doAs(Subject.java:422)2019-02-18T10:29:27.968+01:00
INFOat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)2019-02-18T10:29:28.004+01:00
INFOat org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)2019-02-18T10:29:28.040+01:00
INFOat org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)2019-02-18T10:29:28.075+01:00
INFOCaused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.110+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)2019-02-18T10:29:28.146+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)2019-02-18T10:29:28.182+01:00
INFOat com.bol.fin_hdp.Main.main(Main.java:8)2019-02-18T10:29:28.217+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)2019-02-18T10:29:28.253+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)2019-02-18T10:29:28.289+01:00
INFOat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)2019-02-18T10:29:28.325+01:00
INFOat java.lang.reflect.Method.invoke(Method.java:498)2019-02-18T10:29:28.363+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)2019-02-18T10:29:28.400+01:00
INFO... 12 more2019-02-18T10:29:28.436+01:00
INFOCaused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.473+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)2019-02-18T10:29:28.509+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)2019-02-18T10:29:28.544+01:00
INFOat org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)2019-02-18T10:29:28.581+01:00
INFOat com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)2019-02-18T10:29:28.617+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)2019-02-18T10:29:28.654+01:00
INFO... 19 more2019-02-18T10:29:28.693+01:00
INFOCaused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.2019-02-18T10:29:28.730+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)2019-02-18T10:29:28.766+01:00
INFOat java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)2019-02-18T10:29:28.803+01:00
INFOat java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)2019-02-18T10:29:28.839+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:28.876+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:28.912+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)2019-02-18T10:29:28.948+01:00
INFOat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)2019-02-18T10:29:28.986+01:00
INFOat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)2019-02-18T10:29:29.023+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:29.060+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:29.096+01:00
INFOat org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)2019-02-18T10:29:29.133+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)2019-02-18T10:29:29.169+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)2019-02-18T10:29:29.206+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)2019-02-18T10:29:29.242+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)2019-02-18T10:29:29.278+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)2019-02-18T10:29:29.315+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)2019-02-18T10:29:29.352+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)2019-02-18T10:29:29.388+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)2019-02-18T10:29:29.424+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)2019-02-18T10:29:29.460+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)2019-02-18T10:29:29.496+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)2019-02-18T10:29:29.532+01:00
INFOat java.lang.Thread.run(Thread.java:748)2019-02-18T10:29:29.569+01:00
INFOCaused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.2019-02-18T10:29:29.606+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)2019-02-18T10:29:29.643+01:00
INFO... 17 more2019-02-18T10:29:29.680+01:00
INFOCaused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...2019-02-18T10:29:29.717+01:00
INFOat java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)2019-02-18T10:29:29.753+01:00
INFOat java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)2019-02-18T10:29:29.789+01:00
INFOat java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)2019-02-18T10:29:29.826+01:00
INFOat java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)2019-02-18T10:29:29.862+01:00
INFO... 15 more2019-02-18T10:29:29.898+01:00
INFOCaused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:465002019-02-18T10:29:29.934+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)2019-02-18T10:29:29.970+01:00
INFO... 7 more
Does anyone have tips how to debug this or what configuration changes I need to make?
Reply | Threaded
Open this post in threaded view
|

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

Gary Yao-4

On Tue, Feb 26, 2019 at 7:13 PM Richard Deurwaarder <[hidden email]> wrote:
Hello Gary,

Thank you for your response. 

I'd like to use the new mode but it does not work for me. It seems I am running into a firewall issue.

Because the rest.port is random when running on yarn[1]. The machine I use to deploy the job can, in fact, start the Flink cluster, but it cannot submit the job on the random chosen port because our firewall blocks it.

Do you know if this is still the case on 1.7 and if there is any way to work around this?

Richard


On Thu, Feb 21, 2019 at 3:41 PM Gary Yao <[hidden email]> wrote:
Hi,

Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I am
currently working on removing references to the legacy mode in the
documentation [3]. Is there any reason, you cannot use the "new mode"?

Best,
Gary

[1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html
[2] https://issues.apache.org/jira/browse/FLINK-10392
[3] https://issues.apache.org/jira/browse/FLINK-11713

On Mon, Feb 18, 2019 at 12:00 PM Richard Deurwaarder <[hidden email]> wrote:
Hello,

I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job.

The flink job runs on our hadoop cluster and starts using Yarn. 

Relevant config options seem to be:

jobmanager.rpc.port: 55501

recovery.jobmanager.port: 55502

yarn.application-master.port: 55503

blob.server.port: 55504

I've seen the following behavior:
  - Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
  - Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
  - Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)

When the timeout happens I get the following stacktrace:

INFOclass java.time.Instant does not contain a getter for field seconds2019-02-18T10:16:56.815+01:00
INFOclass com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId2019-02-18T10:16:56.815+01:00
INFOSubmitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).2019-02-18T10:16:57.182+01:00
INFO------------------------------------------------------------2019-02-18T10:29:27.527+01:00
INFOThe program finished with the following exception:2019-02-18T10:29:27.564+01:00
INFOorg.apache.flink.client.program.ProgramInvocationException: The main method caused an error.2019-02-18T10:29:27.601+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)2019-02-18T10:29:27.638+01:00
INFOat org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)2019-02-18T10:29:27.675+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)2019-02-18T10:29:27.711+01:00
INFOat org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)2019-02-18T10:29:27.747+01:00
INFOat org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)2019-02-18T10:29:27.784+01:00
INFOat org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)2019-02-18T10:29:27.820+01:00
INFOat org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)2019-02-18T10:29:27.857+01:00
INFOat org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)2019-02-18T10:29:27.893+01:00
INFOat java.security.AccessController.doPrivileged(Native Method)2019-02-18T10:29:27.929+01:00
INFOat javax.security.auth.Subject.doAs(Subject.java:422)2019-02-18T10:29:27.968+01:00
INFOat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)2019-02-18T10:29:28.004+01:00
INFOat org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)2019-02-18T10:29:28.040+01:00
INFOat org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)2019-02-18T10:29:28.075+01:00
INFOCaused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.110+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)2019-02-18T10:29:28.146+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)2019-02-18T10:29:28.182+01:00
INFOat com.bol.fin_hdp.Main.main(Main.java:8)2019-02-18T10:29:28.217+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)2019-02-18T10:29:28.253+01:00
INFOat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)2019-02-18T10:29:28.289+01:00
INFOat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)2019-02-18T10:29:28.325+01:00
INFOat java.lang.reflect.Method.invoke(Method.java:498)2019-02-18T10:29:28.363+01:00
INFOat org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)2019-02-18T10:29:28.400+01:00
INFO... 12 more2019-02-18T10:29:28.436+01:00
INFOCaused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.2019-02-18T10:29:28.473+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)2019-02-18T10:29:28.509+01:00
INFOat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)2019-02-18T10:29:28.544+01:00
INFOat org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)2019-02-18T10:29:28.581+01:00
INFOat com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)2019-02-18T10:29:28.617+01:00
INFOat com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)2019-02-18T10:29:28.654+01:00
INFO... 19 more2019-02-18T10:29:28.693+01:00
INFOCaused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.2019-02-18T10:29:28.730+01:00
INFOat org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)2019-02-18T10:29:28.766+01:00
INFOat java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)2019-02-18T10:29:28.803+01:00
INFOat java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)2019-02-18T10:29:28.839+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:28.876+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:28.912+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)2019-02-18T10:29:28.948+01:00
INFOat java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)2019-02-18T10:29:28.986+01:00
INFOat java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)2019-02-18T10:29:29.023+01:00
INFOat java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)2019-02-18T10:29:29.060+01:00
INFOat java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)2019-02-18T10:29:29.096+01:00
INFOat org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)2019-02-18T10:29:29.133+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)2019-02-18T10:29:29.169+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)2019-02-18T10:29:29.206+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)2019-02-18T10:29:29.242+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)2019-02-18T10:29:29.278+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)2019-02-18T10:29:29.315+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)2019-02-18T10:29:29.352+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)2019-02-18T10:29:29.388+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)2019-02-18T10:29:29.424+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)2019-02-18T10:29:29.460+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)2019-02-18T10:29:29.496+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)2019-02-18T10:29:29.532+01:00
INFOat java.lang.Thread.run(Thread.java:748)2019-02-18T10:29:29.569+01:00
INFOCaused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.2019-02-18T10:29:29.606+01:00
INFOat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)2019-02-18T10:29:29.643+01:00
INFO... 17 more2019-02-18T10:29:29.680+01:00
INFOCaused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...2019-02-18T10:29:29.717+01:00
INFOat java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)2019-02-18T10:29:29.753+01:00
INFOat java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)2019-02-18T10:29:29.789+01:00
INFOat java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)2019-02-18T10:29:29.826+01:00
INFOat java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)2019-02-18T10:29:29.862+01:00
INFO... 15 more2019-02-18T10:29:29.898+01:00
INFOCaused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:465002019-02-18T10:29:29.934+01:00
INFOat org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)2019-02-18T10:29:29.970+01:00
INFO... 7 more
Does anyone have tips how to debug this or what configuration changes I need to make?