allowNonRestoredState: metadata file in checkpoint dir missing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

allowNonRestoredState: metadata file in checkpoint dir missing

Deshpande, Omkar

Hello,


When deleting an operator we run our application with
--allowNonRestoredState=true, as described in the documentation. When running with this flag, we have observed that the _metadata file is not generated in the checkpoint directory. So, if the application fails, we don’t have the ability to start from the checkpoint. And since the application has failed, we can’t take a savepoint as well.

Is _metadata file not being created in this case expected behavior?

How do we achieve resilience while using --allowNonRestoredState?


We are using Beam with the Flink runner(java). 

  • Beam 2.19

  • Flink 1.9

Omkar
Reply | Threaded
Open this post in threaded view
|

Re: allowNonRestoredState: metadata file in checkpoint dir missing

Congxian Qiu
Hi  Omkar
   `--allowNonRestoredState` would not affect the behavior of checkpoint, it only affects the restore logic. 
   As the problem of not generate _metadata file, could you please check 1) if the job enabled checkpoint; 2) if there is any checkpoint complete successfully.
Best,
Congxian


Deshpande, Omkar <[hidden email]> 于2020年8月1日周六 上午6:15写道:

Hello,


When deleting an operator we run our application with
--allowNonRestoredState=true, as described in the documentation. When running with this flag, we have observed that the _metadata file is not generated in the checkpoint directory. So, if the application fails, we don’t have the ability to start from the checkpoint. And since the application has failed, we can’t take a savepoint as well.

Is _metadata file not being created in this case expected behavior?

How do we achieve resilience while using --allowNonRestoredState?


We are using Beam with the Flink runner(java). 

  • Beam 2.19

  • Flink 1.9

Omkar