Flink state: complex value state pojos vs explicitly managed fields

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink state: complex value state pojos vs explicitly managed fields

Frank Wilson
Hi,

Is it better to have one POJO value state with a collection inside or an explicit state declaration for each member? e.g.

MyPojo {
    long id;
    List[Foo] foos;

    // getter / setters omitted
}

Or 

Two managed state declarations in my process function (a value for the long and a list for the “foos”).

It feels like former is better encapsulated but the latter gives flink more information about the state.

Frank 

Reply | Threaded
Open this post in threaded view
|

Re: Flink state: complex value state pojos vs explicitly managed fields

Timothy Victor
I would choose encapsulation if it the fields are indeed related and makes sense for your model.  In general, I feel it is not a good thing to let Flink (or any other frameworks) internal mechanics dictate your data model.

Tim

On Mon, Jun 17, 2019, 4:59 AM Frank Wilson <[hidden email]> wrote:
Hi,

Is it better to have one POJO value state with a collection inside or an explicit state declaration for each member? e.g.

MyPojo {
    long id;
    List[Foo] foos;

    // getter / setters omitted
}

Or 

Two managed state declarations in my process function (a value for the long and a list for the “foos”).

It feels like former is better encapsulated but the latter gives flink more information about the state.

Frank 

Reply | Threaded
Open this post in threaded view
|

Re: Flink state: complex value state pojos vs explicitly managed fields

Congxian Qiu
Hi,
If you use RocksDBStateBackend, one member one state will get better performance. Because RocksDBStateBackend needs to de/serialize the key/value when put/get, with one POJO value, you need to de/serializer the whole POJO value when put/get.

Best,
Congxian


Timothy Victor <[hidden email]> 于2019年6月17日周一 下午8:04写道:
I would choose encapsulation if it the fields are indeed related and makes sense for your model.  In general, I feel it is not a good thing to let Flink (or any other frameworks) internal mechanics dictate your data model.

Tim

On Mon, Jun 17, 2019, 4:59 AM Frank Wilson <[hidden email]> wrote:
Hi,

Is it better to have one POJO value state with a collection inside or an explicit state declaration for each member? e.g.

MyPojo {
    long id;
    List[Foo] foos;

    // getter / setters omitted
}

Or 

Two managed state declarations in my process function (a value for the long and a list for the “foos”).

It feels like former is better encapsulated but the latter gives flink more information about the state.

Frank