Hello,
I have a question on using flink, we have a small data set which does not change often but have another data set which we need to compare with it and it has lots of data let say I have two collections geofence and locations in mongodb. Geofence collection does not change often and relatively small, but we have location data coming in at high amounts from clients and we need to calculate the goefence entry exits based on geofence and location data point. For calculating the entry and exit we were thinking of using flink CEP. But our problem is sometimes geofence data changes and we need to update the in memory store of the flink somehow we were thinking of bootstrapping the memory of flink processor by loading data on initial start and subscribe to kafaka topic to listen for geofence changes and re-pull the data Is this a valid approach ? Thank you, |
Hi Jaya, Broadcast pattern may help here. Take a look at this: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/broadcast_state.html You'll still keep your geofence data as a stream (depending on the data and use case, maybe the whole list of geofence as a single stream item), broadcast the stream to downstream operators, which will now have geofence data in their state as their slow changing data (processBroadcastElement), and the user location regularly coming to the operator (processElement). --- Oytun Tez M O T A W O R D The World's Fastest Human Translation Platform. On Thu, Jul 25, 2019 at 6:48 PM jaya sai <[hidden email]> wrote:
|
imagine an operator, ProcessFunction, it has 2 incoming data: geofences via broadcast, user location via normal data stream geofence updates and user location updates will come separately into this single operator. 1) when geofence update comes via broadcast, the operator will update its state with the new geofence rules. this.geofenceListState = myNewGeofenceListState this happens in processBroadcastElement() method. whenever geofence data is updated, it will come to this operator, into processBroadcastElement, and you will put the new geofence list into the operator's state. 2) when user location update comes to the operator, via regular stream, you will access this.geofenceListState and do your calculations and collect() whatever you need to collect at the end of computation. regular stream comes, this time, to processElement() method. - geofence update will not affect the previously collected elements from processElement. but Flink will make sure all of instances of this operator in various task managers will get the same geofence update via processBroadcastElement(), no matter whether the operator is keyed. your processElement, meaning your user location updates, will do its calculation via the latest geofence data from processBroadcastElement. when geofence data is updated, user location updates from that point on will use the new geofence data. i hope this is more clear... --- Oytun Tez M O T A W O R D The World's Fastest Human Translation Platform. On Thu, Jul 25, 2019 at 7:08 PM jaya sai <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |