Hi,
How are concurrent snapshots taken for an operator? Let's say an operator receives barriers for a checkpoint from all of its inputs. It triggers the checkpoint. Now, the checkpoint starts getting saved asynchronously. Before the checkpoint is acknowledged, the operator receives all barriers for all inputs for the next checkpoint. What will happen in this case if no concurrent checkpoints are allowed (i.e. the default value is used)? What will happen if concurrent checkpoints are allowed? Thanks, Narendra Joshi |
Hi Narendra,
according to [1], even with asynchronous state snapshots (see [2]), a checkpoint is only complete after all sinks have received the barriers and all (asynchronous) snapshots have been processed. Since, if the number of concurrent checkpoints is 0, no checkpoint barriers will be emitted until the previous checkpoint is complete (see [1]), you will not get into the situation where two asynchronous snapshots are being taken concurrently. If you enable concurrent checkpoints and asynchronous snapshots , they will process concurrently but on different snapshots of the state, i.e. although they are running in parallel, each stores the expected state. If you want to know more about the details of how this is done, I can recommend Stefan's (cc'd) talk at Flink Forward last week [4]. He may also be able to answer in more detail in case I missed something. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/ stream_checkpointing.html#asynchronous-state-snapshots [2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/ state_backends.html [3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ checkpointing.html [4] https://www.youtube.com/watch? v=dWQ24wERItM&index=36&list=PLDX4T_cnKjD0JeULl1X6iTn7VIkDeYX_X On Thursday, 21 September 2017 13:26:17 CEST Narendra Joshi wrote: > Hi, > > How are concurrent snapshots taken for an operator? > Let's say an operator receives barriers for a checkpoint from all of > its inputs. It triggers the checkpoint. Now, the checkpoint starts > getting saved asynchronously. Before the checkpoint is acknowledged, > the operator receives all barriers for all inputs for the next > checkpoint. What will happen in this case if no concurrent checkpoints > are allowed (i.e. the default value is used)? What will happen if > concurrent checkpoints are allowed? > > Thanks, > Narendra Joshi signature.asc (201 bytes) Download Attachment |
Nico Kruber <[hidden email]> writes:
> Hi Narendra, > according to [1], even with asynchronous state snapshots (see [2]), a > checkpoint is only complete after all sinks have received the barriers and all > (asynchronous) snapshots have been processed. Since, if the number of > concurrent checkpoints is 0, no checkpoint barriers will be emitted until the > previous checkpoint is complete (see [1]), you will not get into the situation > where two asynchronous snapshots are being taken concurrently. Does this mean that operators would stop processing streams (because they received all barriers for a new checkpoint) and wait for the ongoing asynchronous checkpoint to complete or it means that no barriers would be injected into sources before checkpoint finishes? > If you enable concurrent checkpoints and asynchronous snapshots , they will > process concurrently but on different snapshots of the state, i.e. although > they are running in parallel, each stores the expected state. > If you want to know more about the details of how this is done, I can > recommend Stefan's (cc'd) talk at Flink Forward last week [4]. He may also be > able to answer in more detail in case I missed something. Thanks for the reference to the talk! :) > > > Nico > > > [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/ > stream_checkpointing.html#asynchronous-state-snapshots > [2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/ops/ > state_backends.html > [3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ > checkpointing.html > [4] https://www.youtube.com/watch? > v=dWQ24wERItM&index=36&list=PLDX4T_cnKjD0JeULl1X6iTn7VIkDeYX_X > > On Thursday, 21 September 2017 13:26:17 CEST Narendra Joshi wrote: >> Hi, >> >> How are concurrent snapshots taken for an operator? >> Let's say an operator receives barriers for a checkpoint from all of >> its inputs. It triggers the checkpoint. Now, the checkpoint starts >> getting saved asynchronously. Before the checkpoint is acknowledged, >> the operator receives all barriers for all inputs for the next >> checkpoint. What will happen in this case if no concurrent checkpoints >> are allowed (i.e. the default value is used)? What will happen if >> concurrent checkpoints are allowed? >> >> Thanks, >> Narendra Joshi > -- Narendra Joshi |
On Thursday, 21 September 2017 20:08:01 CEST Narendra Joshi wrote:
> Nico Kruber <[hidden email]> writes: > > according to [1], even with asynchronous state snapshots (see [2]), a > > checkpoint is only complete after all sinks have received the barriers and > > all (asynchronous) snapshots have been processed. Since, if the number of > > concurrent checkpoints is 0, no checkpoint barriers will be emitted until > > the previous checkpoint is complete (see [1]), you will not get into the > > situation where two asynchronous snapshots are being taken concurrently. > > Does this mean that operators would stop processing streams (because > they received all barriers for a new checkpoint) and wait for > the ongoing asynchronous checkpoint to complete or it means that no > barriers would be injected into sources before checkpoint finishes? The only thing that is waiting for asynchronous state snapshots to complete is the checkpoint coordinator (in any case!) since a checkpoint is only complete once all operators have stored their state. Operation continues as expected. Nico signature.asc (201 bytes) Download Attachment |
Free forum by Nabble | Edit this page |