Hi Marco
You need to figure out why the checkpoint timed out(you can see the consumed time of each period for one checkpoint in UI), if it indeed needs such long time to complete the checkpoint, then you need to configure a longer timeout.
If there are some checkpoint errors, we need first to figure out what the problem is, in general, a checkpoint can split into some parts such as barrie alignment(maybe there is some backpressure or something else, that some barrier can't be received in time), sync duration(the thread is too busy ...), and async duration(too much io/network process ...), etc.
I am kind of stuck in determining how large a checkpoint interval should be.
Is there a guide for that? If a timeout time is 10 minutes, we time out, what is a good strategy for adjusting that?
Where is a good starting point for a checkpoint? How shall they be adjusted?
We often see checkpoint errors during our onTimer calls, I don't know if that's related.
Marco A. Villalobos