Why are externalized checkpoints deleted on Job Manager exit?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Why are externalized checkpoints deleted on Job Manager exit?

Clifford Resnick
Testing externalized checkpoints in a YARN-based cluster, configured with:

env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

I can confirm that checkpoint is retained between cancelled jobs, however it’s deleted when the Job Manager session is gracefully shutdown. We’d really like for the persistent checkpoint to be treated like a Savepoint and not be deleted. Is there a way to enable this?
 

Reply | Threaded
Open this post in threaded view
|

Re: Why are externalized checkpoints deleted on Job Manager exit?

Aljoscha Krettek
+Ufuk

Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what could be going on here?


On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote:
Testing externalized checkpoints in a YARN-based cluster, configured with:

env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

I can confirm that checkpoint is retained between cancelled jobs, however it’s deleted when the Job Manager session is gracefully shutdown. We’d really like for the persistent checkpoint to be treated like a Savepoint and not be deleted. Is there a way to enable this?


Reply | Threaded
Open this post in threaded view
|

Re: Why are externalized checkpoints deleted on Job Manager exit?

Ufuk Celebi
They should actually be not deleted.

Could you please share the logs with me? In the mean time, I will try
to reproduce this.

On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote:

> +Ufuk
>
> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what
> could be going on here?
>
>
> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote:
>>
>> Testing externalized checkpoints in a YARN-based cluster, configured with:
>>
>>
>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>
>> I can confirm that checkpoint is retained between cancelled jobs, however
>> it’s deleted when the Job Manager session is gracefully shutdown. We’d
>> really like for the persistent checkpoint to be treated like a Savepoint and
>> not be deleted. Is there a way to enable this?
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Why are externalized checkpoints deleted on Job Manager exit?

Ufuk Celebi
I don't need the logs. Externalized checkpoints have been configured
to be deleted when the job is suspended, too. When the YARN session is
terminated, all jobs are suspended.

The behaviour seems like a bug. As a work around you have to cancel
the job before you shut down the YARN session. Let me think for a
minute whether there is a good reason to discard externalized
checkpoints on suspension, but I don't think so.

On Thu, Nov 3, 2016 at 3:00 PM, Ufuk Celebi <[hidden email]> wrote:

> They should actually be not deleted.
>
> Could you please share the logs with me? In the mean time, I will try
> to reproduce this.
>
> On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote:
>> +Ufuk
>>
>> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what
>> could be going on here?
>>
>>
>> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote:
>>>
>>> Testing externalized checkpoints in a YARN-based cluster, configured with:
>>>
>>>
>>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>>
>>> I can confirm that checkpoint is retained between cancelled jobs, however
>>> it’s deleted when the Job Manager session is gracefully shutdown. We’d
>>> really like for the persistent checkpoint to be treated like a Savepoint and
>>> not be deleted. Is there a way to enable this?
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Why are externalized checkpoints deleted on Job Manager exit?

Ufuk Celebi
A fix is pending here: https://github.com/apache/flink/pull/2750

The behaviour on graceful shut down/suspension respects the
cancellation behaviour with this change.

On Thu, Nov 3, 2016 at 3:23 PM, Ufuk Celebi <[hidden email]> wrote:

> I don't need the logs. Externalized checkpoints have been configured
> to be deleted when the job is suspended, too. When the YARN session is
> terminated, all jobs are suspended.
>
> The behaviour seems like a bug. As a work around you have to cancel
> the job before you shut down the YARN session. Let me think for a
> minute whether there is a good reason to discard externalized
> checkpoints on suspension, but I don't think so.
>
> On Thu, Nov 3, 2016 at 3:00 PM, Ufuk Celebi <[hidden email]> wrote:
>> They should actually be not deleted.
>>
>> Could you please share the logs with me? In the mean time, I will try
>> to reproduce this.
>>
>> On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote:
>>> +Ufuk
>>>
>>> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what
>>> could be going on here?
>>>
>>>
>>> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote:
>>>>
>>>> Testing externalized checkpoints in a YARN-based cluster, configured with:
>>>>
>>>>
>>>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>>>
>>>> I can confirm that checkpoint is retained between cancelled jobs, however
>>>> it’s deleted when the Job Manager session is gracefully shutdown. We’d
>>>> really like for the persistent checkpoint to be treated like a Savepoint and
>>>> not be deleted. Is there a way to enable this?
>>>>
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Why are externalized checkpoints deleted on Job Manager exit?

Cliff Resnick
Hi,

Anything keeping this from being merged into master?

On Thu, Nov 3, 2016 at 10:56 AM, Ufuk Celebi <[hidden email]> wrote:
A fix is pending here: https://github.com/apache/flink/pull/2750

The behaviour on graceful shut down/suspension respects the
cancellation behaviour with this change.

On Thu, Nov 3, 2016 at 3:23 PM, Ufuk Celebi <[hidden email]> wrote:
> I don't need the logs. Externalized checkpoints have been configured
> to be deleted when the job is suspended, too. When the YARN session is
> terminated, all jobs are suspended.
>
> The behaviour seems like a bug. As a work around you have to cancel
> the job before you shut down the YARN session. Let me think for a
> minute whether there is a good reason to discard externalized
> checkpoints on suspension, but I don't think so.
>
> On Thu, Nov 3, 2016 at 3:00 PM, Ufuk Celebi <[hidden email]> wrote:
>> They should actually be not deleted.
>>
>> Could you please share the logs with me? In the mean time, I will try
>> to reproduce this.
>>
>> On Thu, Nov 3, 2016 at 2:04 PM, Aljoscha Krettek <[hidden email]> wrote:
>>> +Ufuk
>>>
>>> Ufuk recently worked on that, if I'm not mistaken. Do you have an Idea what
>>> could be going on here?
>>>
>>>
>>> On Wed, 2 Nov 2016 at 21:52 Clifford Resnick <[hidden email]> wrote:
>>>>
>>>> Testing externalized checkpoints in a YARN-based cluster, configured with:
>>>>
>>>>
>>>> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>>>>
>>>> I can confirm that checkpoint is retained between cancelled jobs, however
>>>> it’s deleted when the Job Manager session is gracefully shutdown. We’d
>>>> really like for the persistent checkpoint to be treated like a Savepoint and
>>>> not be deleted. Is there a way to enable this?
>>>>
>>>>
>>>