checkpointing when yarn session crashed

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

checkpointing when yarn session crashed

ysn2233
Hi everyone:

Recently I am doing a bucketingsink (to hdfs) job by flink on yarn. However I found that if the yarn session crashed, or I manually killed the yarn session, the file on hdfs does not rename to .pending state and the latest checkpoint does not have _metadata. Therefore I cannot resume my job after yarn restarts. Any ideas about this issue? Thank you very much!
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing when yarn session crashed

Guowei Ma
Could you give more details?  Such as which flink version do you use?  which Statebackend do you use? Does there has any successful checkpoint? and so on..

I can't reproduce your problem. (I used BucketingSinkTestProgram(enable external checkpoint) + Flink 1.7.2 and default StateBackend )

Best,
Guowei


Shengnan YU <[hidden email]> 于2019年4月8日周一 上午11:05写道:
Hi everyone:

Recently I am doing a bucketingsink (to hdfs) job by flink on yarn. However I found that if the yarn session crashed, or I manually killed the yarn session, the file on hdfs does not rename to .pending state and the latest checkpoint does not have _metadata. Therefore I cannot resume my job after yarn restarts. Any ideas about this issue? Thank you very much!