Re: Batch job per stream message?
Posted by
Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Batch-job-per-stream-message-tp16485p16499.html
Hi Tomas,
triggering a batch DataSet job from a DataStream program for each input record doesn't sound like a good idea to me.
You would have to make sure that the cluster always has sufficient resources and handle failures.
It would be preferable to have all data processing in a DataStream job. You mentioned that the challenge is to join the data of the files with a JDBC database.
I see two ways to do that in a DataStream program:
- replicate the JDBC table in a stateful operator. This means that you have to publish updates on the database to the Flink program.
- query the JDBC table with an AsyncFunction. This operator concurrently executes multiple calls to an external service which improves latency and throughput. The operator ensures that checkpoints and watermarks are correctly handled.
Best, Fabian