Hi,
Externalized checkpoints [1] seems to be exactly what you are looking for.
Checkpoints are by default not persisted, unless configured otherwise to be
externalized so that they are not automatically cleaned up when the job
fails. They can be used to resume the job.
On the other hand, it would be interesting to understand why your savepoint
restore sometimes fail.
If you suspect it could be an issue with Flink, could you provide any more
details on the failure?
Cheers,
Gordon
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/