flink app crashed

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

flink app crashed

rainieli
Hi All,

I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated.
2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token
2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).
2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500
2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0
2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token
2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.
2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ


Thanks
Best regards
Rainie
Reply | Threaded
Open this post in threaded view
|

Re: flink app crashed

Jesse Lord

Hi Rainie,

 

I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are probably not highlighting any real issues. If you send more of the log or find an error line that might help others debug.

 

Thanks,

Jesse

 

From: Rainie Li <[hidden email]>
Date: Wednesday, July 15, 2020 at 10:54 AM
To: "[hidden email]" <[hidden email]>
Subject: flink app crashed

 

Hi All,

 

I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated.

2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.

2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.

2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token

2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.

2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).

2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500

2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0

2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.

2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token

2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.

2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ

 

 

Thanks

Best regards

Rainie

Reply | Threaded
Open this post in threaded view
|

Re: flink app crashed

rainieli
Thank you, Jesse.

Here are more log info:

2020-07-15 18:19:36,456 INFO  org.apache.flink.client.cli.CliFrontend                       - --------------------------------------------------------------------------------
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2020-07-15 18:19:36,461 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-07-15 18:19:36,463 WARN  org.apache.flink.client.cli.CliFrontend                       - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1185)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1145)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1070)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 5 more
2020-07-15 18:19:36,519 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not availab\
le.
2020-07-15 18:19:36,647 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2020-07-15 18:19:36,658 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.


Best regards
Rainie

On Wed, Jul 15, 2020 at 11:49 AM Jesse Lord <[hidden email]> wrote:

Hi Rainie,

 

I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are probably not highlighting any real issues. If you send more of the log or find an error line that might help others debug.

 

Thanks,

Jesse

 

From: Rainie Li <[hidden email]>
Date: Wednesday, July 15, 2020 at 10:54 AM
To: "[hidden email]" <[hidden email]>
Subject: flink app crashed

 

Hi All,

 

I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated.

2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.

2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.

2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token

2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.

2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).

2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500

2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0

2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.

2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token

2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.

2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ

 

 

Thanks

Best regards

Rainie

Reply | Threaded
Open this post in threaded view
|

Re: flink app crashed

rainieli
These are the console log after launch the app: 

2020-07-15 19:25:28,507 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - YARN application has been deployed successfully.
Starting execution of program
-------Environment Variables-----
DOCKER_CONFIG=/etc/.docker
FLINK_BIN_DIR=/usr/local/flink-1.9.1/bin
FLINK_CONF_DIR=/etc/flink-1.9.1/conf/
FLINK_LIB_DIR=/usr/local/flink-1.9.1/lib
FLINK_LOG_DIR=/home/karthik/pincohesion
FLINK_OPT_DIR=/usr/local/flink-1.9.1/opt
FLINK_PLUGINS_DIR=/usr/local/flink-1.9.1/plugins
HADOOP_CLASSPATH=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
HADOOP_HOME=/usr/local/hadoop
HISTFILE=/home/rainieli/.bash_history
HISTFILESIZE=2000
HISTIGNORE=
HISTSIZE=1000
HOME=/home/rainieli
JAVA_HOME=/usr/lib/jvm/java-8-oracle
LANG=C.UTF-8
LC_TERMINAL=iTerm2
LC_TERMINAL_VERSION=3.3.9
LESSCLOSE=/usr/bin/lesspipe %s %s
LESSOPEN=| /usr/bin/lesspipe %s
LOGNAME=rainieli
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
MAIL=/var/mail/rainieli
OLDPWD=/home/rainieli
PATH=/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/hadoop/bin:/usr/local/hadoop/bin
PWD=/home/karthik
SHELL=/bin/bash
SHLVL=1
SSH_CLIENT=172.16.11.92 64705 22
SSH_CONNECTION=172.16.11.92 64705 10.2.66.110 22
SSH_TTY=/dev/pts/2
S_COLORS=auto
TERM=xterm-256color
USER=rainieli
-------Command Line Arguments-----
[--conf-file, PIN_JOIN_pin_cohesion_realtime_signal.prod.properties]
Current working directory: null
....... (some serverset info here)

Thanks
Best regards
Rainie

On Wed, Jul 15, 2020 at 12:45 PM Rainie Li <[hidden email]> wrote:
Thank you, Jesse.

Here are more log info:

2020-07-15 18:19:36,456 INFO  org.apache.flink.client.cli.CliFrontend                       - --------------------------------------------------------------------------------
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2020-07-15 18:19:36,461 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-07-15 18:19:36,463 WARN  org.apache.flink.client.cli.CliFrontend                       - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1185)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1145)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1070)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 5 more
2020-07-15 18:19:36,519 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not availab\
le.
2020-07-15 18:19:36,647 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2020-07-15 18:19:36,658 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.


Best regards
Rainie

On Wed, Jul 15, 2020 at 11:49 AM Jesse Lord <[hidden email]> wrote:

Hi Rainie,

 

I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are probably not highlighting any real issues. If you send more of the log or find an error line that might help others debug.

 

Thanks,

Jesse

 

From: Rainie Li <[hidden email]>
Date: Wednesday, July 15, 2020 at 10:54 AM
To: "[hidden email]" <[hidden email]>
Subject: flink app crashed

 

Hi All,

 

I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated.

2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.

2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.

2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token

2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.

2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).

2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500

2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0

2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.

2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token

2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.

2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ

 

 

Thanks

Best regards

Rainie

Reply | Threaded
Open this post in threaded view
|

Re: flink app crashed

Yang Wang
Could you check whether the Flink job has been submitted successfully? You could find
some logs like the following in JobManager.

Starting execution of job ...

Also it will help a lot if you could share the full jobmanager and client log.

Best,
Yang

Rainie Li <[hidden email]> 于2020年7月16日周四 上午4:03写道:
These are the console log after launch the app: 

2020-07-15 19:25:28,507 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - YARN application has been deployed successfully.
Starting execution of program
-------Environment Variables-----
DOCKER_CONFIG=/etc/.docker
FLINK_BIN_DIR=/usr/local/flink-1.9.1/bin
FLINK_CONF_DIR=/etc/flink-1.9.1/conf/
FLINK_LIB_DIR=/usr/local/flink-1.9.1/lib
FLINK_LOG_DIR=/home/karthik/pincohesion
FLINK_OPT_DIR=/usr/local/flink-1.9.1/opt
FLINK_PLUGINS_DIR=/usr/local/flink-1.9.1/plugins
HADOOP_CLASSPATH=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/tools/lib/*
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
HADOOP_HOME=/usr/local/hadoop
HISTFILE=/home/rainieli/.bash_history
HISTFILESIZE=2000
HISTIGNORE=
HISTSIZE=1000
HOME=/home/rainieli
JAVA_HOME=/usr/lib/jvm/java-8-oracle
LANG=C.UTF-8
LC_TERMINAL=iTerm2
LC_TERMINAL_VERSION=3.3.9
LESSCLOSE=/usr/bin/lesspipe %s %s
LESSOPEN=| /usr/bin/lesspipe %s
LOGNAME=rainieli
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
MAIL=/var/mail/rainieli
OLDPWD=/home/rainieli
PATH=/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/hadoop/bin:/usr/local/hadoop/bin
PWD=/home/karthik
SHELL=/bin/bash
SHLVL=1
SSH_CLIENT=172.16.11.92 64705 22
SSH_CONNECTION=172.16.11.92 64705 10.2.66.110 22
SSH_TTY=/dev/pts/2
S_COLORS=auto
TERM=xterm-256color
USER=rainieli
-------Command Line Arguments-----
[--conf-file, PIN_JOIN_pin_cohesion_realtime_signal.prod.properties]
Current working directory: null
....... (some serverset info here)

Thanks
Best regards
Rainie

On Wed, Jul 15, 2020 at 12:45 PM Rainie Li <[hidden email]> wrote:
Thank you, Jesse.

Here are more log info:

2020-07-15 18:19:36,456 INFO  org.apache.flink.client.cli.CliFrontend                       - --------------------------------------------------------------------------------
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2020-07-15 18:19:36,460 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2020-07-15 18:19:36,461 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.execution.failover-strategy, region
2020-07-15 18:19:36,463 WARN  org.apache.flink.client.cli.CliFrontend                       - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1185)
        at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1145)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1070)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 5 more
2020-07-15 18:19:36,519 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not availab\
le.
2020-07-15 18:19:36,647 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2020-07-15 18:19:36,658 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.


Best regards
Rainie

On Wed, Jul 15, 2020 at 11:49 AM Jesse Lord <[hidden email]> wrote:

Hi Rainie,

 

I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are probably not highlighting any real issues. If you send more of the log or find an error line that might help others debug.

 

Thanks,

Jesse

 

From: Rainie Li <[hidden email]>
Date: Wednesday, July 15, 2020 at 10:54 AM
To: "[hidden email]" <[hidden email]>
Subject: flink app crashed

 

Hi All,

 

I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated.

2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.

2020-07-15 16:49:52,759 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.

2020-07-15 16:49:52,762 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.

2020-07-15 16:49:52,790 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher /user/dispatcher was granted leadership with fencing token

2020-07-15 16:49:52,791 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.

2020-07-15 16:49:52,931 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1

2020-07-15 16:49:53,014 INFO org.apache.flink.yarn.YarnResourceManager - Recovered 0 containers from previous attempts ([]).

2020-07-15 16:49:53,018 INFO org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500

2020-07-15 16:49:53,020 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies : 0

2020-07-15 16:49:53,021 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.

2020-07-15 16:49:53,042 INFO org.apache.flink.yarn.YarnResourceManager - ResourceManager akka.tcp://flink@cluster-dev-001/user/resourcemanager was granted leadership with fencing token

2020-07-15 16:49:53,046 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.

2020-07-15 16:50:52,217 INFO org.apache.kafka.clients.Metadata - Cluster ID: FZrfSqHiTpaZwEzIRYkCLQ

 

 

Thanks

Best regards

Rainie