Testing externalized checkpoints in a YARN-based cluster, configured with:
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); I can confirm that checkpoint is retained between cancelled jobs, however it’s deleted when the Job Manager session is gracefully shutdown. We’d really like for the persistent checkpoint to be treated like a Savepoint and not be deleted. Is there a way to enable this? |
+Ufuk
Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what could be going on here? On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote: Testing externalized checkpoints in a YARN-based cluster, configured with: |
They should actually be not deleted.
Could you please share the logs with me? In the mean time, I will try to reproduce this. On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote: > +Ufuk > > Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what > could be going on here? > > > On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote: >> >> Testing externalized checkpoints in a YARN-based cluster, configured with: >> >> >> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); >> >> I can confirm that checkpoint is retained between cancelled jobs, however >> it’s deleted when the Job Manager session is gracefully shutdown. We’d >> really like for the persistent checkpoint to be treated like a Savepoint and >> not be deleted. Is there a way to enable this? >> >> > |
I don't need the logs. Externalized checkpoints have been configured
to be deleted when the job is suspended, too. When the YARN session is terminated, all jobs are suspended. The behaviour seems like a bug. As a work around you have to cancel the job before you shut down the YARN session. Let me think for a minute whether there is a good reason to discard externalized checkpoints on suspension, but I don't think so. On Thu, Nov 3, 2016 at 3:00 PM, Ufuk Celebi <[hidden email]> wrote: > They should actually be not deleted. > > Could you please share the logs with me? In the mean time, I will try > to reproduce this. > > On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote: >> +Ufuk >> >> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what >> could be going on here? >> >> >> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote: >>> >>> Testing externalized checkpoints in a YARN-based cluster, configured with: >>> >>> >>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); >>> >>> I can confirm that checkpoint is retained between cancelled jobs, however >>> it’s deleted when the Job Manager session is gracefully shutdown. We’d >>> really like for the persistent checkpoint to be treated like a Savepoint and >>> not be deleted. Is there a way to enable this? >>> >>> >> |
A fix is pending here: https://github.com/apache/flink/pull/2750
The behaviour on graceful shut down/suspension respects the cancellation behaviour with this change. On Thu, Nov 3, 2016 at 3:23 PM, Ufuk Celebi <[hidden email]> wrote: > I don't need the logs. Externalized checkpoints have been configured > to be deleted when the job is suspended, too. When the YARN session is > terminated, all jobs are suspended. > > The behaviour seems like a bug. As a work around you have to cancel > the job before you shut down the YARN session. Let me think for a > minute whether there is a good reason to discard externalized > checkpoints on suspension, but I don't think so. > > On Thu, Nov 3, 2016 at 3:00 PM, Ufuk Celebi <[hidden email]> wrote: >> They should actually be not deleted. >> >> Could you please share the logs with me? In the mean time, I will try >> to reproduce this. >> >> On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote: >>> +Ufuk >>> >>> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what >>> could be going on here? >>> >>> >>> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote: >>>> >>>> Testing externalized checkpoints in a YARN-based cluster, configured with: >>>> >>>> >>>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); >>>> >>>> I can confirm that checkpoint is retained between cancelled jobs, however >>>> it’s deleted when the Job Manager session is gracefully shutdown. We’d >>>> really like for the persistent checkpoint to be treated like a Savepoint and >>>> not be deleted. Is there a way to enable this? >>>> >>>> >>> |
Hi, Anything keeping this from being merged into master? On Thu, Nov 3, 2016 at 10:56 AM, Ufuk Celebi <[hidden email]> wrote: A fix is pending here: https://github.com/apache/ |
Free forum by Nabble | Edit this page |