State life-cycle for different state-backend implementations

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

State life-cycle for different state-backend implementations

Rinat
Hi mates, got a question about different state backends.

As I've properly understood, on every checkpoint, Flink flushes it’s current state into backend. In case of FsStateBackend we’ll have a separate file for each checkpoint, and during the job lifecycle we got a risk of 
a huge amount of state files in hdfs, that is not very cool for a hadoop name-node.

Does Flink have any clean-up strategies for it’s state in different implementation of backends ? If you could provide any links, where I could read about more details of this process, it’ll be awesome ))

Thx a lot for your help.

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever

Reply | Threaded
Open this post in threaded view
|

Re:State life-cycle for different state-backend implementations

gerryzhou
Hi Rinat,

I think there is one configuration {{state.checkpoints.num-retained}} to control the maximum number of completed checkpoints to retain, the default value is 1. So the risk you mentioned should not happen. Refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#checkpointing you could find more configurations of checkpoint.    

Best, Sihua


On 06/8/2018 22:55[hidden email] wrote:
Hi mates, got a question about different state backends.

As I've properly understood, on every checkpoint, Flink flushes it’s current state into backend. In case of FsStateBackend we’ll have a separate file for each checkpoint, and during the job lifecycle we got a risk of 
a huge amount of state files in hdfs, that is not very cool for a hadoop name-node.

Does Flink have any clean-up strategies for it’s state in different implementation of backends ? If you could provide any links, where I could read about more details of this process, it’ll be awesome ))

Thx a lot for your help.

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever

Reply | Threaded
Open this post in threaded view
|

Re: State life-cycle for different state-backend implementations

Rinat
Hi Sihua, Thx for your reply

On 9 Jun 2018, at 11:42, sihua zhou <[hidden email]> wrote:

Hi Rinat,

I think there is one configuration {{state.checkpoints.num-retained}} to control the maximum number of completed checkpoints to retain, the default value is 1. So the risk you mentioned should not happen. Refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#checkpointing you could find more configurations of checkpoint.    

Best, Sihua


On 06/8/2018 22:55[hidden email] wrote: 
Hi mates, got a question about different state backends.

As I've properly understood, on every checkpoint, Flink flushes it’s current state into backend. In case of FsStateBackend we’ll have a separate file for each checkpoint, and during the job lifecycle we got a risk of 
a huge amount of state files in hdfs, that is not very cool for a hadoop name-node.

Does Flink have any clean-up strategies for it’s state in different implementation of backends ? If you could provide any links, where I could read about more details of this process, it’ll be awesome ))

Thx a lot for your help.

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever



Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever