Hello, We're using the TableAPI and want to optimize for checkpoint alignment times. We received some advice to possibly use at-least-once. I'd like to understand how checkpointing works in at-least-once mode so I understand the caveats and can evaluate whether or not that will work for us. Thanks! -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
Hey Rex, You probably will find the link below helpful; it explains how at-least-once (does not have alignment) is different from exactly-once(needs alignment). It also explains how the alignment phase is skipped in the at-least-once mode. In a high level, at least once mode for a task with multiple input channels 1. does NOT block processing to wait for barriers from all inputs, meaning the task keeps processing data after receiving a barrier even if it has multiple inputs. 2. but still, a task takes a snapshot after seeing the checkpoint barrier from all input channels. In this way, a Snapshot N may contain data change coming from Epoch N+1; that's where "at least once" comes from. On Tue, Jan 12, 2021 at 1:03 PM Rex Fenley <[hidden email]> wrote:
|
Thanks for the info. It sounds like any state which does not have some form of uniqueness could end up being incorrect. Specifically in my case, all rows passing through the execution graph have unique ids. However, any operator from groupby foreign_key then sum/count could end up with an inconsistent count. Normally a retract (-1) and then insert (+1) would keep the count correct, but with "at least once" a retract (-1) may be from epoch n+1 and therefore played twice, making the count equal less than it should actually be. Am I understanding this correctly? Thanks! On Mon, Jan 11, 2021 at 10:06 PM Yuan Mei <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
at least once usually works if the use case can tolerate a certain level of duplication or the computation is idempotent.
Not completely sure how the "retract (-1)" and "insert (+1)" work in your case, but "input data" that leads to a state change (count/sum change) is possible to be played twice after a recovery.
|
Thanks! On Tue, Jan 12, 2021 at 1:56 AM Yuan Mei <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
Free forum by Nabble | Edit this page |