Stop vs Cancel with savepoint

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Stop vs Cancel with savepoint

Thomas Eckestad-2
Hi!

Cancel with savepoint is marked as deprecated in the cli-documentation. It is not marked as deprecated in the REST-API documentation though? Is that a mistake? At least some recommendation regarding stop vs cancel would be appropriate to include in the API doc, or?

As I understand, stop will cancel each operator in the job-DAG bottom-up in a gracefull manner. Conceptually meaning, first cancel the sources, then, when the operators directly downstream to the sources have drained all pending input, those will be canceled as well. This continues until the sinks are done as well. Or, maybe more to the point, the checkpoint barrier triggered for the savepoint will not be followed by any more input data, the sources will stop consuming new data untill the savepoint is complete and the job exits.

Is the above understanding correct? In that case, for some streaming jobs without exactly-once sinks, cancel with savepoint might trigger duplication. Which should be OK of course since the job needs to handle a restart anyway, but it might be beneficial to not generate duplicated output for this specific use case if there is a choice where the alternatives have the same cost implementation wise...

Is my understanding of cancel vs stop correct? If not what is the real practical difference between stop and cancel with savepoint?

To me it feels like cancel with save point should be deprecated in both the rest API and the cli and also there should be a text that explains why it is deprecated and why usage of it is discouraged, or?

Thanks,
Thomas
Thomas Eckestad
Systems Engineer
Road Perception

NIRA Dynamics AB
Wallenbergs gata 4
58330 Link?ping, Sweden
Mobile: +46  738 453 937
[hidden email]
www.niradynamics.se

Reply | Threaded
Open this post in threaded view
|

Re: Stop vs Cancel with savepoint

Chesnay Schepler
Your understanding of cancel vs stop(-with-savepoint) is correct.

I agree that we should update the REST API documentation and have a
section outlining the problems with cancel-with-savepoint.
Would you like to open a ticket yourself?

On 3/3/2021 11:16 AM, Thomas Eckestad wrote:

> Hi!
>
> Cancel with savepoint is marked as deprecated in the cli-documentation. It is not marked as deprecated in the REST-API documentation though? Is that a mistake? At least some recommendation regarding stop vs cancel would be appropriate to include in the API doc, or?
>
> As I understand, stop will cancel each operator in the job-DAG bottom-up in a gracefull manner. Conceptually meaning, first cancel the sources, then, when the operators directly downstream to the sources have drained all pending input, those will be canceled as well. This continues until the sinks are done as well. Or, maybe more to the point, the checkpoint barrier triggered for the savepoint will not be followed by any more input data, the sources will stop consuming new data untill the savepoint is complete and the job exits.
>
> Is the above understanding correct? In that case, for some streaming jobs without exactly-once sinks, cancel with savepoint might trigger duplication. Which should be OK of course since the job needs to handle a restart anyway, but it might be beneficial to not generate duplicated output for this specific use case if there is a choice where the alternatives have the same cost implementation wise...
>
> Is my understanding of cancel vs stop correct? If not what is the real practical difference between stop and cancel with savepoint?
>
> To me it feels like cancel with save point should be deprecated in both the rest API and the cli and also there should be a text that explains why it is deprecated and why usage of it is discouraged, or?
>
> Thanks,
> Thomas
> Thomas Eckestad
> Systems Engineer
> Road Perception
>
> NIRA Dynamics AB
> Wallenbergs gata 4
> 58330 Link?ping, Sweden
> Mobile: +46  738 453 937
> [hidden email]
> www.niradynamics.se
>

Reply | Threaded
Open this post in threaded view
|

Re: Stop vs Cancel with savepoint

Thomas Eckestad-2
OK, thank you for validating my thoughts =) I created https://issues.apache.org/jira/browse/FLINK-21666#

Thanks,
Thomas

On 3 Mar 2021, at 22:02, Chesnay Schepler <[hidden email]> wrote:

Your understanding of cancel vs stop(-with-savepoint) is correct.

I agree that we should update the REST API documentation and have a section outlining the problems with cancel-with-savepoint.
Would you like to open a ticket yourself?

On 3/3/2021 11:16 AM, Thomas Eckestad wrote:
Hi!

Cancel with savepoint is marked as deprecated in the cli-documentation. It is not marked as deprecated in the REST-API documentation though? Is that a mistake? At least some recommendation regarding stop vs cancel would be appropriate to include in the API doc, or?

As I understand, stop will cancel each operator in the job-DAG bottom-up in a gracefull manner. Conceptually meaning, first cancel the sources, then, when the operators directly downstream to the sources have drained all pending input, those will be canceled as well. This continues until the sinks are done as well. Or, maybe more to the point, the checkpoint barrier triggered for the savepoint will not be followed by any more input data, the sources will stop consuming new data untill the savepoint is complete and the job exits.

Is the above understanding correct? In that case, for some streaming jobs without exactly-once sinks, cancel with savepoint might trigger duplication. Which should be OK of course since the job needs to handle a restart anyway, but it might be beneficial to not generate duplicated output for this specific use case if there is a choice where the alternatives have the same cost implementation wise...

Is my understanding of cancel vs stop correct? If not what is the real practical difference between stop and cancel with savepoint?

To me it feels like cancel with save point should be deprecated in both the rest API and the cli and also there should be a text that explains why it is deprecated and why usage of it is discouraged, or?

Thanks,
Thomas
Thomas Eckestad
Systems Engineer
Road Perception

NIRA Dynamics AB
Wallenbergs gata 4
58330 Link?ping, Sweden
Mobile: +46  738 453 937
[hidden email]
www.niradynamics.se