Hi there,
I have a data stream (coming from Kafka) that contains information which I want to enrich with information that sits in a database before I handover the enriched tuple to a sink. How would I do that ? I was thinking of somehow combining my streaming job with a JDBC input but wasn't very succesful in getting this going. Thanks ![]() |
Hi Philipp,
the easist way is a RichMap. In the open()-Method you can load the relevant database table into memory (e.g. a HashMap). In the map()-method you than just look up the entry in the HashMap. Of course, this only works if the dataset is small enough to fit in memory. Is it? Cheers, Konstantin On 12.09.2016 02:36, Philipp Bussche wrote: > Hi there, > I have a data stream (coming from Kafka) that contains information which > I want to enrich with information that sits in a database before I > handover the enriched tuple to a sink. > How would I do that ? > I was thinking of somehow combining my streaming job with a JDBC input > but wasn't very succesful in getting this going. > Thanks > Philipp -- Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Thank you Konstantin, the amount of data I have to load into memory will be very small so that should be alright.
When opening and querying the database would I use any sort of Flink magic or just do plain JDBC ? I read about the JDBCInput concept which one could use with the DataSet API and was wondering if I could use that somehow in my open method then ? Thanks Philipp |
You can just use plain JDBC. Just keep in mind, that the classes will be
serialized and sent through the cluster. So probably, you want to initialize all the non-serializable database access object in the open method itself (as opposed to the constructor (client side)). Cheers, Konstantin On 12.09.2016 13:53, Philipp Bussche wrote: > Thank you Konstantin, the amount of data I have to load into memory will be > very small so that should be alright. > When opening and querying the database would I use any sort of Flink magic > or just do plain JDBC ? > I read about the JDBCInput concept which one could use with the DataSet API > and was wondering if I could use that somehow in my open method then ? > > Thanks > Philipp > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Enriching-a-tuple-mapped-from-a-datastream-with-data-coming-from-a-JDBC-source-tp8993p9002.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. > Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Hi again,
I implemented the RichMap Function (open method runs a JDBC query to populate a HashMap with data) which I am using in the map function. Now there is another RichMap.map function that would add to the HashMap that was initialized in the first function. How would I share the Map between the two functions (I am using the datastreaming API) ? Thanks Philipp |
Hi Philipp, If I got your requirements right you would like to:3) use the hashmap to enrich another stream. In general, it is not possible to share local operator state among different operators (or even parallel instance of the same operator). |
Awesome, thanks Fabian !
I will give this a try.
|
Free forum by Nabble | Edit this page |