Hi Rahul,
The timers are fault tolerant and their timestamp is the absolute value of when to fire.
This means that if you are at time t = 10 and you register a timer “10 ms from now”, the timer will have a firing timestamp of 20.
This is checkpointed, so the “new machine” that takes over the failed task, will have the timer with timestamp 20.
So the when the timer will fire depends on the “new machine” and it may differ from what would happen in the previous machine in the
following cases:
For processing time, in case your new machine (the one that takes over the failed task) has a clock that is out-of-sync with the
previous machine that set the timer to 20.
For event time, given that Flink does not checkpoint watermarks, the timer will fire when the watermark on the new machine surpasses
the timer's timestamp.
I hope this helps,
Kostas
I am looking at timers in apache flink and wanted to confirm if the timers in flink are fault tolerant.
eg. when a timer registered with processFunction, of say 20 sec is running on a node and after 15 seconds (since the timer started), the node failed for some reason. Does flink guarantee that the timer resume on another node? if it does resume does it consider only the remaining time for the timer ie 5 sec in this case?
Thanks & Regards,
Rahul