http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Persisting-state-in-RocksDB-tp44305p44364.html
Hi Paul,
You can leave operators dangling. So no need to add fake sinks.
If you write to HTTP, the best option is actually asyncIO. [1] This will run much much faster.
AsyncIO however has no state access (we want to change that eventually but for now it's to avoid too many antipatterns).
For me it's not clear how you exactly want to use the state. If it's to avoid duplicate requests or the requests look differently then I'd propose.
Source -> Async (transform) -> keyBy+KeyedProcessFunction (deduplicate or update) -> Async (submit)
For the state access transition from DB, it helps to really just think in terms of a single key: what does an operator see if he would only receive records with the same single key? So you probably have a value state which holds information about the previous articles with the same key.
For deduplication, you just need a boolean and you'd not emit anything in the process function if the state is present. If not you set state and emit.
For diffs, you'd fetch the old state value and compare it to the new value. Then you would also update the old state with the new value and emit the diff.
For deleting the old article and replacing it with a new article, you'd fetch the old state, emit a record (delete, old), update state, emit record (put, new).
Flink then takes care of ensuring that the different keys are processed in parallel without interference and also manages the persistence to rocks DB.