Persisting state in RocksDB

Posted by Paul K Moore on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Persisting-state-in-RocksDB-tp44305.html

Hi all,

First post here, so please be kind :)

Firstly some context; I have the following high-level job topology:

(1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction

1. The FlinkPulsarSource reads event notifications about article updates from a Pulsar topic
2. The RichAsyncFunction fetches the “full” article from the specified URL end-point, and transmutes it into a “legacy” article format
3. The SinkFunction writes the “legacy” article to a (legacy) web platform i.e. the sink is effectively another web site

I have this all up and running (despite lots of shading fun).

When the SinkFunction creates an article on the legacy platform it returns an 'HTTP 201 - Created’ with a Location header suitably populated.

Now, I need to persist that Location header and, more explicitly, need to persist a map between the URLs for the new and legacy platforms.  This is needed for latter update and delete processing.

The question is how do I store this mapping information?

I’ve spent some time trying to grok state management and the various backends, but from what I can see the state management is focused on “operator scoped” state.  This seems reasonable given the requirement for barriers etc to ensure accurate recovery.

However, I need some persistence between operators (shared state?) and with longevity beyond the processing of an operator.

My gut reaction is that I need an external K-V store such as Ignite (or similar). Frankly given that Flink ships with embedded RocksDB I was hoping to use that, but there seems no obvious way to do this, and lots of advice saying don’t :)

Am I missing something obvious here?

Many thanks in advance

Paul