Hi Manju,
I guess this exception
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 file=/flink/checkpoints/submittedJobGraph480ddf9572ed
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.<init>(InstantiationUtil.java:68)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:520)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:503)
at org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58)
at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:202)
and the following log statements
2019-05-07 08:28:54,136 WARN org.apache.hadoop.hdfs.DFSClient - No live nodes contain block BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 after checking nodes = [], ignoredNodes = null
2019-05-07 08:28:54,137 INFO org.apache.hadoop.hdfs.DFSClient - No node available for BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 file=/flink/checkpoints/submittedJobGraph480ddf9572ed
2019-05-07 08:28:54,137 INFO org.apache.hadoop.hdfs.DFSClient - Could not obtain BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 from any node: No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
2019-05-07 08:28:54,137 WARN org.apache.hadoop.hdfs.DFSClient - DFS chooseDataNode: got # 1 IOException, will wait for 1498.8531884268646 msec.
pretty much explain what's happening. Flink cannot read all the blocks belonging to the submitted job graph file and fails due to this. This looks like a HDFS problem to me.
Cheers,
Till