FLINK-6117 issue work around

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

FLINK-6117 issue work around

sunny yun
Hi,

Using flink 1.2.0, I faced to issue https://issues.apache.org/jira/browse/FLINK-6117https://issues.apache.org/jira/browse/FLINK-6117.
This issue is fixed at version 1.3.0. But I have some reason to trying to find out work around.

I did,
2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar
3. set flink-conf.yaml "zookeeper.sasl.disable: true"
4. run yarn-session.sh


Original problem-Authentication failed- seems to be passed.
But I got this error,

Exception in thread "main" java.lang.RuntimeException: Failed to retrieve JobManager address
        at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:627)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
        at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID.
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
        at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
        ... 9 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at scala.concurrent.Await.result(package.scala)
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
        ... 10 more


I believe related setting(flink, hadoop, zookeeper) is correct. Because yarn-session works smoothly with flink 1.3.2 in same environment.

Does anyone have any inspiration for this error message?

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: FLINK-6117 issue work around

Nico Kruber
I looked at the commit you cherry-picked and nothing in there explains the
error you got. This rather sounds like something might be mixed up between
(remaining artefacts of) flink 1.3 and 1.2.

Can you verify that nothing of your flink 1.3 tests remains, e.g. running
JobManager or TaskManager instances? Also that you're not accidentally running
the yarn-session.sh script of 1.3?


Nico

On Wednesday, 6 September 2017 06:36:42 CEST Sunny Yun wrote:

> Hi,
>
> Using flink 1.2.0, I faced to issue
> https://issues.apache.org/jira/browse/FLINK-6117
> https://issues.apache.org/jira/browse/FLINK-6117.
> This issue is fixed at version 1.3.0. But I have some reason to trying to
> find out work around.
>
> I did,
> 1. change source according to
> https://github.com/apache/flink/commit/eef85e095a8a0e4c4553631b74ba7b9f173ce
> bf0 2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar
> 3. set flink-conf.yaml "zookeeper.sasl.disable: true"
> 4. run yarn-session.sh
>
>
> Original problem-Authentication failed- seems to be passed.
> But I got this error,
>
> Exception in thread "main" java.lang.RuntimeException: Failed to retrieve
> JobManager address
>         at
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl
> ient.java:248) at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:6
> 27) at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav
> a:476) at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav
> a:473) at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurity
> Context.java:43) at java.security.AccessController.doPrivileged(Native
> Method) at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
> va:1656) at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSec
> urityContext.java:40) at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:
> 473) Caused by:
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
> not retrieve the leader address and leader session ID.
>         at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI
> nfo(LeaderRetrievalUtils.java:175) at
> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl
> ient.java:242) ... 9 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [60000 milliseconds]
>         at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scal
> a:53) at scala.concurrent.Await$.result(package.scala:190)
>         at scala.concurrent.Await.result(package.scala)
>         at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI
> nfo(LeaderRetrievalUtils.java:173) ... 10 more
>
>
> I believe related setting(flink, hadoop, zookeeper) is correct. Because
> yarn-session works smoothly with flink 1.3.2 in same environment.
>
> Does anyone have any inspiration for this error message?
>
> Thanks.
>
> ᐧ


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FLINK-6117 issue work around

sunny yun
Nico, thank you for your reply.

I looked at the commit you cherry-picked and nothing in there explains the
error you got.
==>
The commit I cherry-picked makes setting of 'zookeeper.sasl.disable' work
correctly.
I changed flink-dist_2.11-1.2.0.jar according to it.
So now zookeeper.sasl problem is gone.
Yes, the error log I posted in the original message is completely different
one.


Can you verify that nothing of your flink 1.3 tests remains
==>
Below is what I just reproduced. I have 4 nodes cluster with non-secure.
After run yarn-session.sh, JM process be created in flink-03 node but TM
process not.
Standalone works well.
Any clue would be really appreciate. Thanks.


[bistel@flink-01 ~]$ jps
1888 ResourceManager
2000 NodeManager
2433 NameNode
2546 DataNode
2754 SecondaryNameNode
2891 Jps
1724 QuorumPeerMain

[bistel@flink-02 ~]$ jps
2018 Jps
1721 NodeManager
1881 DataNode
1515 QuorumPeerMain

[bistel@flink-03 ~]$ jps
1521 QuorumPeerMain
1975 Jps
1724 NodeManager
1885 DataNode

[bistel@flink-04 ~]$ jps
2090 Jps
1515 QuorumPeerMain
1789 NodeManager
1950 DataNode

[bistel@flink-01 ~]$ /usr/local/flink-1.2.0/bin/yarn-session.sh -n 4
2017-09-07 09:49:35,467 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, flink-01
2017-09-07 09:49:35,468 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-07 09:49:35,468 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 4096
2017-09-07 09:49:35,468 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.heap.mb, 8192
2017-09-07 09:49:35,468 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2017-09-07 09:49:35,469 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-07 09:49:35,469 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: parallelism.default, 4
2017-09-07 09:49:35,469 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-07 09:49:35,469 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability, zookeeper
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.quorum,
flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.root, /flink
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.namespace,
/cluster_one
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.storageDir,
hdfs:///flink/recovery
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-attempts, 10
2017-09-07 09:49:35,470 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.containers.vcores, 20
2017-09-07 09:49:35,471 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-master.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:35,471 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:35,471 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: zookeeper.sasl.disable, true
2017-09-07 09:49:35,662 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, flink-01
2017-09-07 09:49:35,662 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-07 09:49:35,662 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 4096
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.heap.mb, 8192
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: parallelism.default, 4
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-07 09:49:35,663 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability, zookeeper
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.quorum,
flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.root, /flink
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.namespace,
/cluster_one
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.storageDir,
hdfs:///flink/recovery
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-attempts, 10
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.containers.vcores, 20
2017-09-07 09:49:35,664 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-master.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:35,665 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:35,665 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: zookeeper.sasl.disable, true
2017-09-07 09:49:36,519 WARN  org.apache.hadoop.util.NativeCodeLoader                      
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2017-09-07 09:49:36,779 INFO
org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
set to bistel (auth:SIMPLE)
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, flink-01
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.mb, 4096
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.heap.mb, 8192
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2017-09-07 09:49:37,084 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.memory.preallocate, false
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: parallelism.default, 4
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.web.port, 8081
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability, zookeeper
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.quorum,
flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.root, /flink
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.namespace,
/cluster_one
2017-09-07 09:49:37,085 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.storageDir,
hdfs:///flink/recovery
2017-09-07 09:49:37,086 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-attempts, 10
2017-09-07 09:49:37,086 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.containers.vcores, 20
2017-09-07 09:49:37,086 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.application-master.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:37,086 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH,
/opt/tibco/TIBRV/8.0/lib
2017-09-07 09:49:37,086 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: zookeeper.sasl.disable, true
2017-09-07 09:49:37,103 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
- Using values:
2017-09-07 09:49:37,103 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
-   TaskManager count = 4
2017-09-07 09:49:37,103 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
-   JobManager memory = 1024
2017-09-07 09:49:37,103 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
-   TaskManager memory = 1024
2017-09-07 09:49:37,118 INFO  org.apache.hadoop.yarn.client.RMProxy                        
- Connecting to ResourceManager at flink-01/10.1.0.4:8032
2017-09-07 09:49:39,084 INFO  org.apache.flink.yarn.Utils                                  
- Copying from file:/usr/local/flink-1.2.0/lib to
hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/lib
2017-09-07 09:49:43,419 INFO  org.apache.flink.yarn.Utils                                  
- Copying from file:/usr/local/flink-1.2.0/conf/log4j.properties to
hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/log4j.properties
2017-09-07 09:49:43,552 INFO  org.apache.flink.yarn.Utils                                  
- Copying from file:/usr/local/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to
hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/flink-dist_2.11-1.2.0.jar
2017-09-07 09:49:43,816 INFO  org.apache.flink.yarn.Utils                                  
- Copying from /usr/local/flink-1.2.0/conf/flink-conf.yaml to
hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/flink-conf.yaml
2017-09-07 09:49:43,903 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
- Submitting application master application_1504745288687_0001
2017-09-07 09:49:44,011 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
application application_1504745288687_0001
2017-09-07 09:49:44,011 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
- Waiting for the cluster to be allocated
2017-09-07 09:49:44,030 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
- Deploying cluster, current state ACCEPTED
2017-09-07 09:49:50,326 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  
- YARN application has been deployed successfully.
Exception in thread "main" java.lang.RuntimeException: Failed to retrieve
JobManager address
        at
org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248)
        at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:627)
        at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
        at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
        at
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by:
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not
retrieve the leader address and leader session ID.
        at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175)
        at
org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242)
        ... 9 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[60000 milliseconds]
        at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at scala.concurrent.Await.result(package.scala)
        at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173)
        ... 10 more
2017-09-07 09:50:51,519 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Shutting down YarnClusterClient from the client shutdown hook
2017-09-07 09:50:51,519 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Sending shutdown request to the Application Master
2017-09-07 09:50:51,549 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Start application client.
2017-09-07 09:50:51,549 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Starting client actor system.
2017-09-07 09:50:51,807 INFO  akka.event.slf4j.Slf4jLogger                                
- Slf4jLogger started
2017-09-07 09:50:51,836 INFO  Remoting                                                    
- Starting remoting
2017-09-07 09:50:51,936 INFO  Remoting                                                    
- Remoting started; listening on addresses
:[akka.tcp://flink@flink-01:45463]
2017-09-07 09:50:51,954 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:52,967 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:53,986 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:55,007 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:56,026 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:57,047 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:58,067 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:50:59,087 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:51:00,107 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:51:01,127 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:51:01,944 WARN  org.apache.flink.yarn.YarnClusterClient                      
- Error while stopping YARN cluster.
java.util.concurrent.TimeoutException: Futures timed out after [10000
milliseconds]
        at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
        at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
        at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
        at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.ready(package.scala:169)
        at scala.concurrent.Await.ready(package.scala)
        at
org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)
        at
org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)
        at
org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)
        at
org.apache.flink.yarn.YarnClusterClient$ClientShutdownHook.run(YarnClusterClient.java:446)
2017-09-07 09:51:01,946 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Deleted Yarn properties file at /tmp/.yarn-properties-bistel
2017-09-07 09:51:01,946 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Deleting files in
hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001
2017-09-07 09:51:02,146 INFO  org.apache.flink.yarn.ApplicationClient                      
- Sending StopCluster request to JobManager.
2017-09-07 09:51:02,490 INFO  org.apache.flink.yarn.YarnClusterClient                      
- Application application_1504745288687_0001 finished with state RUNNING and
final state UNDEFINED at 0
2017-09-07 09:51:02,490 INFO  org.apache.flink.yarn.YarnClusterClient                      
- YARN Client is shutting down
2017-09-07 09:51:02,598 INFO  org.apache.flink.yarn.ApplicationClient                      
- Stopped Application client.
2017-09-07 09:51:02,599 INFO  org.apache.flink.yarn.ApplicationClient                      
- Disconnect from JobManager null.
2017-09-07 09:51:02,633 INFO
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting
down remote daemon.
2017-09-07 09:51:02,635 INFO
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote
daemon shut down; proceeding with flushing remote transports.
2017-09-07 09:51:02,651 INFO
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting
shut down.

[bistel@flink-01 ~]$ jps
1888 ResourceManager
2000 NodeManager
2433 NameNode
2546 DataNode
2754 SecondaryNameNode
3143 Jps
1724 QuorumPeerMain

[bistel@flink-02 ~]$ jps
2018 Jps
1721 NodeManager
1881 DataNode
1515 QuorumPeerMain

[bistel@flink-03 ~]$ jps
1521 QuorumPeerMain
2054 YarnApplicationMasterRunner
1724 NodeManager
1885 DataNode
2142 Jps

[bistel@flink-04 ~]$ jps
2090 Jps
1515 QuorumPeerMain
1789 NodeManager
1950 DataNode



Nico Kruber wrote

> I looked at the commit you cherry-picked and nothing in there explains the
> error you got. This rather sounds like something might be mixed up between
> (remaining artefacts of) flink 1.3 and 1.2.
>
> Can you verify that nothing of your flink 1.3 tests remains, e.g. running
> JobManager or TaskManager instances? Also that you're not accidentally
> running
> the yarn-session.sh script of 1.3?
>
>
> Nico
>
> On Wednesday, 6 September 2017 06:36:42 CEST Sunny Yun wrote:
>> Hi,
>>
>> Using flink 1.2.0, I faced to issue
>> https://issues.apache.org/jira/browse/FLINK-6117
>> https://issues.apache.org/jira/browse/FLINK-6117.
>> This issue is fixed at version 1.3.0. But I have some reason to trying to
>> find out work around.
>>
>> I did,
>> 1. change source according to
>> https://github.com/apache/flink/commit/eef85e095a8a0e4c4553631b74ba7b9f173ce
>> bf0 2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar
>> 3. set flink-conf.yaml "zookeeper.sasl.disable: true"
>> 4. run yarn-session.sh
>>
>>
>> Original problem-Authentication failed- seems to be passed.
>> But I got this error,
>>
>> Exception in thread "main" java.lang.RuntimeException: Failed to retrieve
>> JobManager address
>>         at
>> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl
>> ient.java:248) at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:6
>> 27) at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav
>> a:476) at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav
>> a:473) at
>> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurity
>> Context.java:43) at java.security.AccessController.doPrivileged(Native
>> Method) at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
>> va:1656) at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSec
>> urityContext.java:40) at
>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:
>> 473) Caused by:
>> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
>> not retrieve the leader address and leader session ID.
>>         at
>> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI
>> nfo(LeaderRetrievalUtils.java:175) at
>> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl
>> ient.java:242) ... 9 more
>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
>> [60000 milliseconds]
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>         at
>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>> at
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scal
>> a:53) at scala.concurrent.Await$.result(package.scala:190)
>>         at scala.concurrent.Await.result(package.scala)
>>         at
>> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI
>> nfo(LeaderRetrievalUtils.java:173) ... 10 more
>>
>>
>> I believe related setting(flink, hadoop, zookeeper) is correct. Because
>> yarn-session works smoothly with flink 1.3.2 in same environment.
>>
>> Does anyone have any inspiration for this error message?
>>
>> Thanks.
>>
>> ᐧ
>
>
>
> signature.asc (201 bytes)
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15426/0/signature.asc>





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/