Persisting state in RocksDB
Posted by Paul K Moore on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Persisting-state-in-RocksDB-tp44305.html
Hi all,
First post here, so please be kind :)
Firstly some context; I have the following high-level job topology:
(1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction
1. The FlinkPulsarSource reads event notifications about article updates from a Pulsar topic
2. The RichAsyncFunction fetches the “full” article from the specified URL end-point, and transmutes it into a “legacy” article format
3. The SinkFunction writes the “legacy” article to a (legacy) web platform i.e. the sink is effectively another web site
I have this all up and running (despite lots of shading fun).
When the SinkFunction creates an article on the legacy platform it returns an 'HTTP 201 - Created’ with a Location header suitably populated.
Now, I need to persist that Location header and, more explicitly, need to persist a map between the URLs for the new and legacy platforms. This is needed for latter update and delete processing.
The question is how do I store this mapping information?
I’ve spent some time trying to grok state management and the various backends, but from what I can see the state management is focused on “operator scoped” state. This seems reasonable given the requirement for barriers etc to ensure accurate recovery.
However, I need some persistence between operators (shared state?) and with longevity beyond the processing of an operator.
My gut reaction is that I need an external K-V store such as Ignite (or similar). Frankly given that Flink ships with embedded RocksDB I was hoping to use that, but there seems no obvious way to do this, and lots of advice saying don’t :)
Am I missing something obvious here?
Many thanks in advance
Paul