checkpoint _metadata file has >20x different in size among different check-points

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

checkpoint _metadata file has >20x different in size among different check-points

Yu Yang
Hi all, 

We have a flink job that does check-pointing per 10 minutes. We noticed that for the check-points of this job,  the _metadata file size can vary a lot. In some checkpoint, we observe that _metadata file size was >900MB, while in some other check-points of the same job, the _metadata file size is < 4MB.  Any insights on what may cause the difference? 

Thank you!

Regards, 
-Yu
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint _metadata file has >20x different in size among different check-points

Arvid Heise-3
Hi Yu,

are you using incremental checkpoints [1]? If so, then the smaller checkpoints would be the deltas and the larger the complete state.


On Wed, Mar 4, 2020 at 6:41 PM Yu Yang <[hidden email]> wrote:
Hi all, 

We have a flink job that does check-pointing per 10 minutes. We noticed that for the check-points of this job,  the _metadata file size can vary a lot. In some checkpoint, we observe that _metadata file size was >900MB, while in some other check-points of the same job, the _metadata file size is < 4MB.  Any insights on what may cause the difference? 

Thank you!

Regards, 
-Yu
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint _metadata file has >20x different in size among different check-points

Congxian Qiu
Hi

Maybe there contains some ByteStreamStateHandle in the checkpoint, if you want to verify this, maybe you can configure `state.backend.fs.memory-threshold` to verify it. Please be careful to set this config, because it may produce many files with small size.

Best,
Congxian


Arvid Heise <[hidden email]> 于2020年3月5日周四 上午2:26写道:
Hi Yu,

are you using incremental checkpoints [1]? If so, then the smaller checkpoints would be the deltas and the larger the complete state.


On Wed, Mar 4, 2020 at 6:41 PM Yu Yang <[hidden email]> wrote:
Hi all, 

We have a flink job that does check-pointing per 10 minutes. We noticed that for the check-points of this job,  the _metadata file size can vary a lot. In some checkpoint, we observe that _metadata file size was >900MB, while in some other check-points of the same job, the _metadata file size is < 4MB.  Any insights on what may cause the difference? 

Thank you!

Regards, 
-Yu