Hi,
I am running into some strange issues on yarn with Flink 1.1.3 & 4. For some reason I started getting this error (see under text.) The job manager starts and the application is in Accepted state but cannot seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems strange) I didn't change anything on the yarn cluster and this seemed to work previously (but I just cant get it to work now). The yarn-site.xml contains the proper rm addresses. Anybody has any ideas where to go from here? Cheers, Gyula JM log: 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping interval is 60000 ms. 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8030 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing ipc connection to 0.0.0.0/0.0.0.0:8030: Connection refused java.net.ConnectException: Call From splat24.sto.midasplayer.com/172.25.86.166 to 0.0.0.0:8030 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) at org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259) at org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185) at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97) at akka.actor.ActorCell.create(ActorCell.scala:580) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) Client: 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 1 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 1024 2016-11-12 12:31:31,102 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 11000 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2016-11-12 12:31:31,394 WARN org.apache.flink.yarn.YarnClusterDescriptor - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/conf/log4j.properties to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml 2016-11-12 12:33:02,218 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1478896050022_0013 2016-11-12 12:33:02,256 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1478896050022_0013 2016-11-12 12:33:02,257 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated 2016-11-12 12:33:02,259 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2016-11-12 12:34:02,485 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster |
Hi, What happened is that I compiled Flink with the wrong hadoop version... Sorry :) Gyula Gyula Fóra <[hidden email]> ezt írta (időpont: 2016. nov. 12., Szo, 13:11):
|
Good to know that you solved this. :) Do you think there is something we can do to help users noticing this situation faster?
– Ufuk On 13 November 2016 at 00:23:21, Gyula Fóra ([hidden email]) wrote: > Hi, > > What happened is that I compiled Flink with the wrong hadoop version... > > Sorry :) > Gyula > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo, > 13:11): > > > Hi, > > > > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For > > some reason I started getting this error (see under text.) > > The job manager starts and the application is in Accepted state but cannot > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems > > strange) > > > > I didn't change anything on the yarn cluster and this seemed to work > > previously (but I just cant get it to work now). The yarn-site.xml contains > > the proper rm addresses. > > > > Anybody has any ideas where to go from here? > > > > Cheers, > > Gyula > > > > JM log: > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping interval > is 60000 ms. > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8030 > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing ipc connection > to 0.0.0.0/0.0.0.0:8030: Connection refused > > > > java.net.ConnectException: Call From splat24.sto.midasplayer.com/172.25.86.166 > to 0.0.0.0:8030 failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > > at org.apache.hadoop.ipc.Client.call(Client.java:1359) > > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source) > > at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:497) > > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source) > > at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) > > at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > > at org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259) > > at org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185) > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97) > > at akka.actor.ActorCell.create(ActorCell.scala:580) > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > > > > > > Client: > > > > 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli > - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor > to locate the jar > > 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli > - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor > to locate the jar > > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - > Using values: > > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - > TaskManager count = 1 > > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor - > JobManager memory = 1024 > > 2016-11-12 12:31:31,102 INFO org.apache.flink.yarn.YarnClusterDescriptor - > TaskManager memory = 11000 > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting > to ResourceManager at /0.0.0.0:8032 > > 2016-11-12 12:31:31,394 WARN org.apache.flink.yarn.YarnClusterDescriptor - > The file system scheme is 'file'. This indicates that the specified Hadoop configuration > path is wrong and the system is using the default Hadoop configuration values.The Flink > YARN client needs to store its files in a distributed file system > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/conf/log4j.properties > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml > > 2016-11-12 12:33:02,218 INFO org.apache.flink.yarn.YarnClusterDescriptor - > Submitting application master application_1478896050022_0013 > > 2016-11-12 12:33:02,256 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl > - Submitted application application_1478896050022_0013 > > 2016-11-12 12:33:02,257 INFO org.apache.flink.yarn.YarnClusterDescriptor - > Waiting for the cluster to be allocated > > 2016-11-12 12:33:02,259 INFO org.apache.flink.yarn.YarnClusterDescriptor - > Deploying cluster, current state ACCEPTED > > 2016-11-12 12:34:02,485 INFO org.apache.flink.yarn.YarnClusterDescriptor - > Deployment took more than 60 seconds. Please check if the requested resources are available > in the YARN cluster > > > > > |
Hi, The main problem was that whatever was going wrong was not apparent in the Flink Application master runner but it was only shown in the YarnClient debug log. If you run with the default INFO log level all you see that the Yarn client is trying to fail over again and again as if something was wrong with the resource manager. Setting it to debug actually shows the error. Also it would be great if there was a way to verify YARN versions and incompatibility, not sure if this is possible easily. Gyula Ufuk Celebi <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 9:42): Good to know that you solved this. :) Do you think there is something we can do to help users noticing this situation faster? |
What was the log message shown on DEBUG level?
Maybe it makes sense to promote it to INFO. ;) I guess there is no easy way to verify the version, right Max or Robert? On 14 November 2016 at 10:45:52, Gyula Fóra ([hidden email]) wrote: > Hi, > > The main problem was that whatever was going wrong was not apparent in the > Flink Application master runner but it was only shown in the YarnClient > debug log. > > If you run with the default INFO log level all you see that the Yarn client > is trying to fail over again and again as if something was wrong with the > resource manager. Setting it to debug actually shows the error. > > Also it would be great if there was a way to verify YARN versions and > incompatibility, not sure if this is possible easily. > > Gyula > > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42): > > > Good to know that you solved this. :) Do you think there is something we > > can do to help users noticing this situation faster? > > > > – Ufuk > > > > On 13 November 2016 at 00:23:21, Gyula Fóra ([hidden email]) wrote: > > > Hi, > > > > > > What happened is that I compiled Flink with the wrong hadoop version... > > > > > > Sorry :) > > > Gyula > > > > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo, > > > 13:11): > > > > > > > Hi, > > > > > > > > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For > > > > some reason I started getting this error (see under text.) > > > > The job manager starts and the application is in Accepted state but > > cannot > > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems > > > > strange) > > > > > > > > I didn't change anything on the yarn cluster and this seemed to work > > > > previously (but I just cant get it to work now). The yarn-site.xml > > contains > > > > the proper rm addresses. > > > > > > > > Anybody has any ideas where to go from here? > > > > > > > > Cheers, > > > > Gyula > > > > > > > > JM log: > > > > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping > > interval > > > is 60000 ms. > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - > > Connecting to /0.0.0.0:8030 > > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing > > ipc connection > > > to 0.0.0.0/0.0.0.0:8030: Connection refused > > > > > > > > java.net.ConnectException: Call From > > splat24.sto.midasplayer.com/172.25.86.166 > > > to 0.0.0.0:8030 failed on connection exception: > > java.net.ConnectException: Connection > > > refused; For more details see: > > http://wiki.apache.org/hadoop/ConnectionRefused > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > Method) > > > > at > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > > > at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > > > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359) > > > > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source) > > > > at > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > at java.lang.reflect.Method.invoke(Method.java:497) > > > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > > > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source) > > > > at > > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) > > > > at > > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > > > > at > > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259) > > > > at > > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185) > > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97) > > > > at akka.actor.ActorCell.create(ActorCell.scala:580) > > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > > > > > > > > > > > > Client: > > > > > > > > 2016-11-12 12:31:31,080 INFO > > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > > - No path for the flink jar passed. Using the location of class > > org.apache.flink.yarn.YarnClusterDescriptor > > > to locate the jar > > > > 2016-11-12 12:31:31,080 INFO > > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > > - No path for the flink jar passed. Using the location of class > > org.apache.flink.yarn.YarnClusterDescriptor > > > to locate the jar > > > > 2016-11-12 12:31:31,101 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > Using values: > > > > 2016-11-12 12:31:31,101 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > TaskManager count = 1 > > > > 2016-11-12 12:31:31,101 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > JobManager memory = 1024 > > > > 2016-11-12 12:31:31,102 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > TaskManager memory = 11000 > > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy - > > Connecting > > > to ResourceManager at /0.0.0.0:8032 > > > > 2016-11-12 12:31:31,394 WARN > > org.apache.flink.yarn.YarnClusterDescriptor - > > > The file system scheme is 'file'. This indicates that the specified > > Hadoop configuration > > > path is wrong and the system is using the default Hadoop configuration > > values.The Flink > > > YARN client needs to store its files in a distributed file system > > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying > > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties > > > to > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties > > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying > > from file:/fjord/sites/flink-1.1.3/lib > > > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib > > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying > > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar > > > to > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar > > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying > > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar > > > to > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar > > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying > > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml > > > to > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml > > > > 2016-11-12 12:33:02,218 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > Submitting application master application_1478896050022_0013 > > > > 2016-11-12 12:33:02,256 INFO > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl > > > - Submitted application application_1478896050022_0013 > > > > 2016-11-12 12:33:02,257 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > Waiting for the cluster to be allocated > > > > 2016-11-12 12:33:02,259 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > Deploying cluster, current state ACCEPTED > > > > 2016-11-12 12:34:02,485 INFO > > org.apache.flink.yarn.YarnClusterDescriptor - > > > Deployment took more than 60 seconds. Please check if the requested > > resources are available > > > in the YARN cluster > > > > > > > > > > > > > > > > |
What I mean is the logs coming from org.apache.hadoop.ipc.Client if you look at my original email (at JM logs) Gyula Ufuk Celebi <[hidden email]> ezt írta (időpont: 2016. nov. 14., H, 10:52): What was the log message shown on DEBUG level? |
Ah, sorry. I thought it was something related to Flink. ;)
On 14 November 2016 at 10:59:44, Gyula Fóra ([hidden email]) wrote: > What I mean is the logs coming from org.apache.hadoop.ipc.Client if you > look at my original email (at JM logs) > > Gyula > > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 10:52): > > > What was the log message shown on DEBUG level? > > > > Maybe it makes sense to promote it to INFO. ;) > > > > I guess there is no easy way to verify the version, right Max or Robert? > > > > On 14 November 2016 at 10:45:52, Gyula Fóra ([hidden email]) wrote: > > > Hi, > > > > > > The main problem was that whatever was going wrong was not apparent in > > the > > > Flink Application master runner but it was only shown in the YarnClient > > > debug log. > > > > > > If you run with the default INFO log level all you see that the Yarn > > client > > > is trying to fail over again and again as if something was wrong with the > > > resource manager. Setting it to debug actually shows the error. > > > > > > Also it would be great if there was a way to verify YARN versions and > > > incompatibility, not sure if this is possible easily. > > > > > > Gyula > > > > > > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42): > > > > > > > Good to know that you solved this. :) Do you think there is something > > we > > > > can do to help users noticing this situation faster? > > > > > > > > – Ufuk > > > > > > > > On 13 November 2016 at 00:23:21, Gyula Fóra ([hidden email]) > > wrote: > > > > > Hi, > > > > > > > > > > What happened is that I compiled Flink with the wrong hadoop > > version... > > > > > > > > > > Sorry :) > > > > > Gyula > > > > > > > > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo, > > > > > 13:11): > > > > > > > > > > > Hi, > > > > > > > > > > > > I am running into some strange issues on yarn with Flink 1.1.3 & > > 4. For > > > > > > some reason I started getting this error (see under text.) > > > > > > The job manager starts and the application is in Accepted state but > > > > cannot > > > > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 > > seems > > > > > > strange) > > > > > > > > > > > > I didn't change anything on the yarn cluster and this seemed to > > work > > > > > > previously (but I just cant get it to work now). The yarn-site.xml > > > > contains > > > > > > the proper rm addresses. > > > > > > > > > > > > Anybody has any ideas where to go from here? > > > > > > > > > > > > Cheers, > > > > > > Gyula > > > > > > > > > > > > JM log: > > > > > > > > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The > > ping > > > > interval > > > > > is 60000 ms. > > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - > > > > Connecting to /0.0.0.0:8030 > > > > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - > > closing > > > > ipc connection > > > > > to 0.0.0.0/0.0.0.0:8030: Connection refused > > > > > > > > > > > > java.net.ConnectException: Call From > > > > splat24.sto.midasplayer.com/172.25.86.166 > > > > > to 0.0.0.0:8030 failed on connection exception: > > > > java.net.ConnectException: Connection > > > > > refused; For more details see: > > > > http://wiki.apache.org/hadoop/ConnectionRefused > > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > > Method) > > > > > > at > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > > > > > at > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > > > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > > > > > at org.apache.hadoop.net > > .NetUtils.wrapWithMessage(NetUtils.java:783) > > > > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359) > > > > > > at > > > > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > > > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source) > > > > > > at > > > > > > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > > > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > > > at > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > > > > at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > > > > at java.lang.reflect.Method.invoke(Method.java:497) > > > > > > at > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > > > > > > at > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > > > > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source) > > > > > > at > > > > > > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) > > > > > > at > > > > > > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > > > > > > at > > > > > > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259) > > > > > > at > > > > > > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185) > > > > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > > > > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97) > > > > > > at akka.actor.ActorCell.create(ActorCell.scala:580) > > > > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > > > > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > > > > > > > > > > > > > > > > > > Client: > > > > > > > > > > > > 2016-11-12 12:31:31,080 INFO > > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > > > > - No path for the flink jar passed. Using the location of class > > > > org.apache.flink.yarn.YarnClusterDescriptor > > > > > to locate the jar > > > > > > 2016-11-12 12:31:31,080 INFO > > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > > > > - No path for the flink jar passed. Using the location of class > > > > org.apache.flink.yarn.YarnClusterDescriptor > > > > > to locate the jar > > > > > > 2016-11-12 12:31:31,101 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > Using values: > > > > > > 2016-11-12 12:31:31,101 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > TaskManager count = 1 > > > > > > 2016-11-12 12:31:31,101 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > JobManager memory = 1024 > > > > > > 2016-11-12 12:31:31,102 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > TaskManager memory = 11000 > > > > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy > > - > > > > Connecting > > > > > to ResourceManager at /0.0.0.0:8032 > > > > > > 2016-11-12 12:31:31,394 WARN > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > The file system scheme is 'file'. This indicates that the specified > > > > Hadoop configuration > > > > > path is wrong and the system is using the default Hadoop > > configuration > > > > values.The Flink > > > > > YARN client needs to store its files in a distributed file system > > > > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying > > > > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties > > > > > to > > > > > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties > > > > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying > > > > from file:/fjord/sites/flink-1.1.3/lib > > > > > to > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib > > > > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying > > > > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar > > > > > to > > > > > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar > > > > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying > > > > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar > > > > > to > > > > > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar > > > > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying > > > > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml > > > > > to > > > > > > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml > > > > > > 2016-11-12 12:33:02,218 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > Submitting application master application_1478896050022_0013 > > > > > > 2016-11-12 12:33:02,256 INFO > > > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl > > > > > - Submitted application application_1478896050022_0013 > > > > > > 2016-11-12 12:33:02,257 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > Waiting for the cluster to be allocated > > > > > > 2016-11-12 12:33:02,259 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > Deploying cluster, current state ACCEPTED > > > > > > 2016-11-12 12:34:02,485 INFO > > > > org.apache.flink.yarn.YarnClusterDescriptor - > > > > > Deployment took more than 60 seconds. Please check if the requested > > > > resources are available > > > > > in the YARN cluster > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |