Reg Checkpoint size using RocksDb

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reg Checkpoint size using RocksDb

Anirudh Mallem
Hi,
I was experimenting with using RocksDb as the state backend for my job and to test its behavior I modified the socket word count program to store states. I also wrote a RichMapFunction which stores the states as a ValueState with default value as null. 
What the job does basically is, for every word received if the current state is null then it updates the state with a fixed value say “abc” and in case the state is nonNull then it is cleared. 
So ideally if my input stream has the word “foo” twice then the corresponding state is first set to “abc” and then cleared at the second “foo”. I see that this behavior is occurring as expected but the checkpointed size keeps increasing! Is this expected? I believe the checkpointed size as shown on the dashboard should decrease when some of the states are cleared right? 
In this case if each of the “foo” word come in successive checkpointing intervals then we should observe rise and one fall in the checkpointing size right? I see the checkpointed size is increasing in both cases!!!

Any ideas as to what is happening here? My checkpoint duration is 5 secs. Thanks.

Regards,
Anirudh


Reply | Threaded
Open this post in threaded view
|

Re: Reg Checkpoint size using RocksDb

Stephan Ewen
Hi!

If you use the default checkpoint mode, Flink will checkpoint the current RocksDB instance. It may be that there simply has not been a compaction in RocksDB when checkpointing, so the checkpoint contains some "old data" as well.

If you switch to the "fully async" mode, it should always only checkpoint the latest state of RocksDB.

Best,
Stephan


On Mon, Dec 19, 2016 at 10:47 AM, Anirudh Mallem <[hidden email]> wrote:
Hi,
I was experimenting with using RocksDb as the state backend for my job and to test its behavior I modified the socket word count program to store states. I also wrote a RichMapFunction which stores the states as a ValueState with default value as null. 
What the job does basically is, for every word received if the current state is null then it updates the state with a fixed value say “abc” and in case the state is nonNull then it is cleared. 
So ideally if my input stream has the word “foo” twice then the corresponding state is first set to “abc” and then cleared at the second “foo”. I see that this behavior is occurring as expected but the checkpointed size keeps increasing! Is this expected? I believe the checkpointed size as shown on the dashboard should decrease when some of the states are cleared right? 
In this case if each of the “foo” word come in successive checkpointing intervals then we should observe rise and one fall in the checkpointing size right? I see the checkpointed size is increasing in both cases!!!

Any ideas as to what is happening here? My checkpoint duration is 5 secs. Thanks.

Regards,
Anirudh



Reply | Threaded
Open this post in threaded view
|

Re: Reg Checkpoint size using RocksDb

Anirudh Mallem
Hi Stephan,
Thanks for your response. I shall try switching to the fully Async mode and see. 
On another note, is there any option available to configure compaction capabilities using the default checkpointing mode? Thanks.

From: Stephan Ewen
Reply-To: "[hidden email]"
Date: Monday, December 19, 2016 at 11:51 AM
To: "[hidden email]"
Subject: Re: Reg Checkpoint size using RocksDb

Hi!

If you use the default checkpoint mode, Flink will checkpoint the current RocksDB instance. It may be that there simply has not been a compaction in RocksDB when checkpointing, so the checkpoint contains some "old data" as well.

If you switch to the "fully async" mode, it should always only checkpoint the latest state of RocksDB.

Best,
Stephan


On Mon, Dec 19, 2016 at 10:47 AM, Anirudh Mallem <[hidden email]> wrote:
Hi,
I was experimenting with using RocksDb as the state backend for my job and to test its behavior I modified the socket word count program to store states. I also wrote a RichMapFunction which stores the states as a ValueState with default value as null. 
What the job does basically is, for every word received if the current state is null then it updates the state with a fixed value say “abc” and in case the state is nonNull then it is cleared. 
So ideally if my input stream has the word “foo” twice then the corresponding state is first set to “abc” and then cleared at the second “foo”. I see that this behavior is occurring as expected but the checkpointed size keeps increasing! Is this expected? I believe the checkpointed size as shown on the dashboard should decrease when some of the states are cleared right? 
In this case if each of the “foo” word come in successive checkpointing intervals then we should observe rise and one fall in the checkpointing size right? I see the checkpointed size is increasing in both cases!!!

Any ideas as to what is happening here? My checkpoint duration is 5 secs. Thanks.

Regards,
Anirudh