State snapshotting when source is finite
Posted by
Flavio Pompermaier on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/State-snapshotting-when-source-is-finite-tp16398.html
Hi to all,
in my current use case I'd like to improve one step of our batch pipeline.
There's one specific job that ingest a tabular dataset (of Rows) and explode it into a set of RDF statements (as Tuples). The objects we output are a containers of those Tuples (grouped by a field).
Flink stateful streaming could be a perfect fit here because we incrementally increase the state of those containers but we don't have to spend a lot of time performing some GET operation to an external Key-value store.
The big problem here is that the sources are finite and the state of the job gets lost once the job ends, while I was expecting that Flink was snapshotting the state of its operators before exiting.
Do you think that it could be possible to support such a use case (that we can summarize as "periodic batch jobs that pick up where they left")?
Best,
Flavio