checkpoint interval and hdfs file capacity

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

checkpoint interval and hdfs file capacity

lec ssmi
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right? 
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint interval and hdfs file capacity

Congxian Qiu
Hi
    No matter what interval you set, Flink will take care of the checkpoints(remove the useless checkpoint when it can), but when you set a very small checkpoint interval, there may be much high pressure for the storage system(here is RPC pressure of HDFS NN).

Best,
Congxian


lec ssmi <[hidden email]> 于2020年11月10日周二 下午1:19写道:
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right? 
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint interval and hdfs file capacity

lec ssmi
Thanks.
   I have some jobs with the checkpoint interval 1000ms. And the HDFS  files grow too large to work normally . 
What I am curious about is, are writing and deleting performed synchronously? Is it possible to add too fast to delete old files?

Congxian Qiu <[hidden email]> 于2020年11月10日周二 下午2:16写道:
Hi
    No matter what interval you set, Flink will take care of the checkpoints(remove the useless checkpoint when it can), but when you set a very small checkpoint interval, there may be much high pressure for the storage system(here is RPC pressure of HDFS NN).

Best,
Congxian


lec ssmi <[hidden email]> 于2020年11月10日周二 下午1:19写道:
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right? 
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint interval and hdfs file capacity

Congxian Qiu
Hi
    Currently, checkpoint discard logic was executed in Executor[1], maybe it will not be deleted so quickly


Best,
Congxian


lec ssmi <[hidden email]> 于2020年11月10日周二 下午2:25写道:
Thanks.
   I have some jobs with the checkpoint interval 1000ms. And the HDFS  files grow too large to work normally . 
What I am curious about is, are writing and deleting performed synchronously? Is it possible to add too fast to delete old files?

Congxian Qiu <[hidden email]> 于2020年11月10日周二 下午2:16写道:
Hi
    No matter what interval you set, Flink will take care of the checkpoints(remove the useless checkpoint when it can), but when you set a very small checkpoint interval, there may be much high pressure for the storage system(here is RPC pressure of HDFS NN).

Best,
Congxian


lec ssmi <[hidden email]> 于2020年11月10日周二 下午1:19写道:
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right?