Hi All,
I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated. 2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher . 2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. 2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. 2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}. 2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token 2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs. 2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]). 2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500 2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0 2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}. 2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token 2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager. 2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ Thanks Best regards Rainie |
Hi Rainie, I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are
probably not highlighting any real issues. If you send more of the log or find an error line that might help others debug. Thanks, Jesse From: Rainie Li <[hidden email]> Hi All, I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated. 2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
at akka://flink/user/dispatcher . 2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService
/leader/resource_manager_lock. 2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService
/leader/dispatcher_lock. 2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}. 2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership
with fencing token 2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs. 2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]). 2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size
is 500 2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies
: 0 2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}. 2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager
was granted leadership with fencing token 2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager. 2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ Thanks Best regards Rainie |
Thank you, Jesse. Here are more log info: 2020-07-15 18:19:36,456 INFO org.apache.flink.client.cli.CliFrontend - -------------------------------------------------------------------------------- 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-07-15 18:19:36,461 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-07-15 18:19:36,463 WARN org.apache.flink.client.cli.CliFrontend - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli. java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1185) at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1145) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1070) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 5 more 2020-07-15 18:19:36,519 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not availab\ le. 2020-07-15 18:19:36,647 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2020-07-15 18:19:36,658 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. Best regards Rainie On Wed, Jul 15, 2020 at 11:49 AM Jesse Lord <[hidden email]> wrote:
|
These are the console log after launch the app: 2020-07-15 19:25:28,507 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. Starting execution of program -------Environment Variables----- DOCKER_CONFIG=/etc/.docker FLINK_BIN_DIR=/usr/local/flink-1.9.1/bin FLINK_CONF_DIR=/etc/flink-1.9.1/conf/ FLINK_LIB_DIR=/usr/local/flink-1.9.1/lib FLINK_LOG_DIR=/home/karthik/pincohesion FLINK_OPT_DIR=/usr/local/flink-1.9.1/opt FLINK_PLUGINS_DIR=/usr/local/flink-1.9.1/plugins HADOOP_CLASSPATH=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/* HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop HADOOP_HOME=/usr/local/hadoop HISTFILE=/home/rainieli/.bash_history HISTFILESIZE=2000 HISTIGNORE= HISTSIZE=1000 HOME=/home/rainieli JAVA_HOME=/usr/lib/jvm/java-8-oracle LANG=C.UTF-8 LC_TERMINAL=iTerm2 LC_TERMINAL_VERSION=3.3.9 LESSCLOSE=/usr/bin/lesspipe %s %s LESSOPEN=| /usr/bin/lesspipe %s LOGNAME=rainieli LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: MAIL=/var/mail/rainieli OLDPWD=/home/rainieli PATH=/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/hadoop/bin:/usr/local/hadoop/bin PWD=/home/karthik SHELL=/bin/bash SHLVL=1 SSH_CLIENT=172.16.11.92 64705 22 SSH_CONNECTION=172.16.11.92 64705 10.2.66.110 22 SSH_TTY=/dev/pts/2 S_COLORS=auto TERM=xterm-256color USER=rainieli -------Command Line Arguments----- [--conf-file, PIN_JOIN_pin_cohesion_realtime_signal.prod.properties] Current working directory: null ....... (some serverset info here) Thanks Best regards Rainie On Wed, Jul 15, 2020 at 12:45 PM Rainie Li <[hidden email]> wrote:
|
Could you check whether the Flink job has been submitted successfully? You could find some logs like the following in JobManager. Starting execution of job ... Also it will help a lot if you could share the full jobmanager and client log. Best, Yang Rainie Li <[hidden email]> 于2020年7月16日周四 上午4:03写道:
|
Free forum by Nabble | Edit this page |