Re: Calling external services/databases from DataStream API

Posted by Jonas Gröger on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Calling-external-services-databases-from-DataStream-API-tp11366p11367.html

I have a similar usecase where I (for the purposes of this discussion) have a GeoIP Database that is not fully available from the start but will eventually be "full". The GeoIP tuples are coming in one after another. After ~4M tuples the GeoIP database is complete.

I also need to do the same query.

The way I do it right now is that I connect the two using ipStream.connect(geoIpStream).flatMap(CODE) where in CODE I either lookup the country (flatMap1) or I update the GeoIP state flatMap(2). For the state I use a ValueStateDescriptor of type GeoIPDatabase which contains all GeoIP information.

An alternative approach would be to have the Database in a file in the filesystem (in-memory preferably) and then load it in the enrich operator.