Hi all,
We're interested in doing some analysis on how the size of our savepoints and state affects the time it takes to restore from a savepoint. We're running Flink 1.12 and using RocksDB as a state backend, on Kubernetes. What is the best way to measure the size of a Flink Application's state? Is state.backend.rocksdb.metrics.total-sst-files-size the right thing to look at? We tried looking at state.backend.rocksdb.metrics.total-sst-files-size for all our operators, after restoring from a savepoint, and we noticed that the sum of all the sst files sizes is much much smaller than the total size of our savepoint (7GB vs 10TB). Where does that discrepancy come from? Do you have any general advice on correlating savepoint size with restore times? Thanks in advance! |
Hi, Kevin I think that there is no metric for the time that spends on restoring from a savepoint. As for why there is a huge difference between the size of sst and the size of savepoint, I think @Yun can give some detailed insights. Best, Guowei On Thu, Apr 1, 2021 at 1:38 AM Kevin Lam <[hidden email]> wrote:
|
HI Kevin,
Currently, you can view logs to find when to start and finish to restore [1] to know how much time spent on task side. Flink-1.13 also try to expose stage of task initializations [2] and maybe it could help you.
state.backend.rocksdb.metrics.total-sst-files-size should
be correct to describe the sst file size. We can have several reasons why the savepoint size larger than sst-files size:
However,
the difference is really huge, have you ever logined machines having keyed state to see how much space occupried? And what's the incremental checkpoint size of your job, have you ever enabeld TTL for state?
Best
Yun Tang
From: Guowei Ma <[hidden email]>
Sent: Thursday, April 1, 2021 11:57 To: Kevin Lam <[hidden email]> Cc: user <[hidden email]>; Yun Tang <[hidden email]> Subject: Re: Measuring the Size of State, Savepoint Size vs. Restore time Hi, Kevin
I think that there is no metric for the time that spends on restoring from a savepoint. As for why there is a huge difference between the size of sst and the size of savepoint, I think @Yun can give some detailed insights. Best,
Guowei
On Thu, Apr 1, 2021 at 1:38 AM Kevin Lam <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |