Query regarding state backend for Custom Map Function

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Query regarding state backend for Custom Map Function

Anirudh Mallem
Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 

Regards,
Anirudh 
Reply | Threaded
Open this post in threaded view
|

Re: Query regarding state backend for Custom Map Function

Stefan Richter
Hi,

using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.

If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.

On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.

Best,
Stefan


Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <[hidden email]>:

Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 

Regards,
Anirudh 

Reply | Threaded
Open this post in threaded view
|

Re: Query regarding state backend for Custom Map Function

Anirudh Mallem
Thanks a lot Stefan. I got what I was looking for. Is the MapState functionality coming as a part of the 1.2 release? 

From: Stefan Richter
Reply-To: "[hidden email]"
Date: Thursday, December 1, 2016 at 2:53 AM
To: "[hidden email]"
Subject: Re: Query regarding state backend for Custom Map Function

Hi,

using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.

If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.

On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.

Best,
Stefan


Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <[hidden email]>:

Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 

Regards,
Anirudh 

Reply | Threaded
Open this post in threaded view
|

Re: Query regarding state backend for Custom Map Function

Stefan Richter
Hi,

unfortunately, I think it is a little unlikely that it will still make it into 1.2.

Best,
Stefan

Am 01.12.2016 um 20:29 schrieb Anirudh Mallem <[hidden email]>:

Thanks a lot Stefan. I got what I was looking for. Is the MapState functionality coming as a part of the 1.2 release? 

From: Stefan Richter
Reply-To: "[hidden email]"
Date: Thursday, December 1, 2016 at 2:53 AM
To: "[hidden email]"
Subject: Re: Query regarding state backend for Custom Map Function

Hi,

using the ValueState and RocksDB to store a map inside the value state means that you will have a different map for each key, which is automatically swapped on a per record basis, depending on the record’s key. If you are using a map and Checkpointed, there is only one map and your code is responsible for dispatching state between different keys.

If you use a map and Checkpointed, the map will be on the heap and the checkpoint will go directly against the filesystem; this is independent of the chosen backend, so no RocksDB is involved.

On a further note, we are working on an alternative to ValueState that is like a MapState. In contrast to ValueState, MapState does not deserialize the whole map on each access, but can access individual key/value pairs. This might be what you are looking for.

Best,
Stefan


Am 01.12.2016 um 09:35 schrieb Anirudh Mallem <[hidden email]>:

Hi Everyone,
I am trying to understand the Working With State feature page of the Flink documentation.
 My question is in case I am using a ValueState in my CustomMap class to store my states with the RocksDb as my state backend then it is clear that every state value is stored in RocksDb. 
Now instead of a ValueState if I just use a normal Java Hashmap to store my states and implement the Checkpointed interface then will the entire HashMap reside on the RocksDb backend or will the HashMap be in memory and just the snapshots sent to RocksDb? I am trying to see what will I lose/gain if I have my own data structure to do state maintenance. Thanks. 

Regards,
Anirudh