Re: Sync two DataStreams
Posted by
David Anderson-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Sync-two-DataStreams-tp34076p34081.html
There are a few ways to pre-ingest data from a side input before beginning to process another stream. One is to use the State Processor API [1] to create a savepoint that has the data from that side input in its state. For a simple example of bootstrapping state into a savepoint, see [2].
Another approach is to buffer the stream to be validated in Flink state until the side input has been fully ingested. Or run the job once with no event traffic and take a savepoint once the model has been broadcast.
Yet another solution might be to use a custom source that reads from one topic and then the other. See [3] and [4] for an example of that.
Other references on this topic include FLIP-17 [5] and Gregory Fee's talk on bootstrapping state [6].
Regards,