We're doing something bad with our Flink state. We just launched a feature that creates very big values (lists of objects that we append to) in MapState. Our checkpoints time out (10 minutes). I'm assuming the values are too big. Backpressure is okay and cpu+memory metrics look okay. Questions 1. Is there an easy tool for inspecting the Flink state? I found this post about drilling into Flink state. I was hoping for something more like a CLI. 2. Is there a way to break down the time spent during a checkout if it times out? Thanks! - Dan |
Hi Dan, Flink should already have integrate a tool in the web UI to monitor the detailed statistics of the checkpoint [1]. It would show the time consumed in each part and each task, thus it could be used to debug the checkpoint timeout. Best, Yun [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/
Hi Yun. The UI was not useful for this case. I had a feeling before hand about what the issue was. We refactored the state and now the checkpoint is 10x faster. On Mon, Jun 14, 2021 at 5:47 AM Yun Gao <[hidden email]> wrote:
Free forum by Nabble | Edit this page |