ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

Elkhan Dadashov
Hi Flinkers,

Was curious about if there is any performance(memory/speed) difference between these two options:

in window process functions, when keeping state:

1) Create a single ValueState<MyClass>, and store state in pure Java objects

class MyClass {
   List<OtherClass> listOtherClass;
   Map<String, SomeOtherClass> mapKeyToSomeValue;
}

public class MyProcessFunc
      extends KeyedProcessFunction<String, X, Tuple3<Long, Long, Float>> {
...
   ValueState<MyClass> valueState;
...
}

vs

2) Create ListState and MapState as 2 Flink state variables:

public class MyProcessFunc
      extends KeyedProcessFunction<String, X, Tuple3<Long, Long, Float>> {
...
   ListState<OtherClass> listState;
   MapState<String, SomeOtherClass> mapState;
...
}

Which option is a recommended way of storing the states?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

David Anderson-2

[Note that this question is better suited for the user mailing list than dev.]

In general using ListState<T> and MapState<U, V> is recommended rather than using ValueState<List<T>> or ValueState<Map<U, V>>.

Some of the state backends are able to optimize common access patterns for ListState and MapState in ways that are not possible for ValueState that happens to wrap a List or Map. In particular, the RocksDB state backend can append to ListState without having to deserialize and reserialize the entire list, and each element of MapState is a separate RocksDB object, making it possible to both read and update entries in MapState without deserializing the entire map. 

David

On Fri, Jan 17, 2020 at 8:45 PM Elkhan Dadashov <[hidden email]> wrote:
Hi Flinkers,

Was curious about if there is any performance(memory/speed) difference between these two options:

in window process functions, when keeping state:

1) Create a single ValueState<MyClass>, and store state in pure Java objects

class MyClass {
   List<OtherClass> listOtherClass;
   Map<String, SomeOtherClass> mapKeyToSomeValue;
}

public class MyProcessFunc
      extends KeyedProcessFunction<String, X, Tuple3<Long, Long, Float>> {
...
   ValueState<MyClass> valueState;
...
}

vs

2) Create ListState and MapState as 2 Flink state variables:

public class MyProcessFunc
      extends KeyedProcessFunction<String, X, Tuple3<Long, Long, Float>> {
...
   ListState<OtherClass> listState;
   MapState<String, SomeOtherClass> mapState;
...
}

Which option is a recommended way of storing the states?

Thanks.