Best practices for complex state manipulation

classic Classic list List threaded Threaded
3 messages Options
Dan
Reply | Threaded
Open this post in threaded view
|

Best practices for complex state manipulation

Dan
Hi!

I'm working on a join setup that does fuzzy matching in case the client does not send enough parameters to join by a foreign key.  There's a few ways I can store the state.  I'm curious about best practices around this.  I'm using rocksdb as the state storage.

I was reading the code for IntervalJoin and was a little shocked by the implementation.  It feels designed for very short join intervals.

I read this set of pages but I'm looking for one level deeper.  E.g. what are performance characteristics of different types of state crud operations with rocksdb?  E.g. I could create extra MapState to act as an index.  When is this worth it?


Reply | Threaded
Open this post in threaded view
|

Re: Best practices for complex state manipulation

Tzu-Li (Gordon) Tai
Hi Dan,

For a deeper dive into state backends and how they manage state, or performance critical aspects such as state serialization and choosing appropriate state structures, I highly recommend starting from this webinar done by my colleague Seth Weismann: https://www.youtube.com/watch?v=9GF8Hwqzwnk.

Cheers,
Gordon

On Wed, Mar 10, 2021 at 1:58 AM Dan Hill <[hidden email]> wrote:
Hi!

I'm working on a join setup that does fuzzy matching in case the client does not send enough parameters to join by a foreign key.  There's a few ways I can store the state.  I'm curious about best practices around this.  I'm using rocksdb as the state storage.

I was reading the code for IntervalJoin and was a little shocked by the implementation.  It feels designed for very short join intervals.

I read this set of pages but I'm looking for one level deeper.  E.g. what are performance characteristics of different types of state crud operations with rocksdb?  E.g. I could create extra MapState to act as an index.  When is this worth it?


Dan
Reply | Threaded
Open this post in threaded view
|

Re: Best practices for complex state manipulation

Dan
Thanks Gordon and Seth!

On Wed, Mar 10, 2021, 21:55 Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi Dan,

For a deeper dive into state backends and how they manage state, or performance critical aspects such as state serialization and choosing appropriate state structures, I highly recommend starting from this webinar done by my colleague Seth Weismann: https://www.youtube.com/watch?v=9GF8Hwqzwnk.

Cheers,
Gordon

On Wed, Mar 10, 2021 at 1:58 AM Dan Hill <[hidden email]> wrote:
Hi!

I'm working on a join setup that does fuzzy matching in case the client does not send enough parameters to join by a foreign key.  There's a few ways I can store the state.  I'm curious about best practices around this.  I'm using rocksdb as the state storage.

I was reading the code for IntervalJoin and was a little shocked by the implementation.  It feels designed for very short join intervals.

I read this set of pages but I'm looking for one level deeper.  E.g. what are performance characteristics of different types of state crud operations with rocksdb?  E.g. I could create extra MapState to act as an index.  When is this worth it?