Hi,
Consider a managed keyed state backed by HDFS with checkpointing enabled. Now, as the state grows the state data will be saved on HDFS. Now, let's say, we clear the state. Would the state data be removed from HDFS too? How does Flink manage to clear the state data from state backend on clearing the keyed state? -- Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master. |
Hi Garvit, > Now, let's say, we clear the state. Would the state data be removed from HDFS too?
The state data would not be removed from HDFS immediately, if you clear the state in your job. But after you clearing the state in your job, the later completed checkpoint won't contain the state any more.
> How does Flink manage to clear the state data from state backend on clearing the keyed state? 1. you can use the {{tate.checkpoints.num-retained}} to set the number of the completed checkpoint maintanced on HDFS. 2. If you set {{env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. DELETE_ON_CANCELLATION )}} then the checkpoints on HDFS will be removed once your job is finished(or cancled). And if you set {{env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.Please refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html to find more information. Additional, I'd like to give a bref info of the checkpoint on HDFS. In a nutshell, what ever you did with the state in your running job, they only effect the content on the state backend locally. When checkpointing, flink takes a snapshot of the local state backend, and send it to the checkpoint target directory(in your case, the HDFS). The checkpoints on the HDFS looks like the periodic snapshot of the state backend of your job, they can be created or deleted but never be changed. Maybe Stefan(cc) could give you more professional information and plz correct me if I'm incorrect. Best, Sihua On 06/21/2018 14:40,[hidden email] wrote:
|
So, would it delete all the files in HDFS associated with the cleared state? On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <[hidden email]> wrote:
Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master. |
Hi Garvit,
Let's say you clearing the state at timestamp t1, then the checkpoints completed before t1 will still contains the data you cleared. But the future checkpoints won't contain the cleared data again. But I'm not sure what you meaning by the cleared state, you can only clear a key-value pair of the state currently, you can't cleared the whole state currently. Best, Sihua
On 06/21/2018 15:41,[hidden email] wrote:
|
I am maintaining state data for a key in ValueState. As per [0] I can clear() state for that key. Please let me know. Thanks, On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <[hidden email]> wrote:
Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master. |
Now, after clearing state for a key, I don't want that redundant data in the state backend. This is my concern. Please let me know if there are any gaps. Thanks, On Thu, Jun 21, 2018 at 1:31 PM Garvit Sharma <[hidden email]> wrote:
Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master. |
Yes, you can clear the state for a key(the currently active key), if you clear it, it means that you have also cleaned it from the state backend, and the future checpoints won't contains the key anymore unless you add it again.
Best, Sihua
On 06/21/2018 16:04,[hidden email] wrote:
|
Thank you for the clarification. On Thu, Jun 21, 2018 at 1:36 PM sihua zhou <[hidden email]> wrote:
Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master. |
Free forum by Nabble | Edit this page |