question for handling db data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

question for handling db data

jaya sai
Hello,

I have a question on using flink, we have a small data set which does not change often but have another data set which we need to compare with it and it has lots of data

let say I have two collections geofence and locations in mongodb. Geofence collection does not change often and relatively small, but we have location data coming in at high amounts from clients and we need to calculate the goefence entry exits based on geofence and location data point.
For calculating the entry and exit we were thinking of using flink CEP. But our problem is sometimes geofence data changes and we need to update the in memory store of the flink somehow

we were thinking of bootstrapping the memory of flink processor by loading data on initial start and subscribe to kafaka topic to listen for geofence changes and re-pull the data
Is this a valid approach ?

Thank you,
Reply | Threaded
Open this post in threaded view
|

Re: question for handling db data

Oytun Tez
Hi Jaya,


You'll still keep your geofence data as a stream (depending on the data and use case, maybe the whole list of geofence as a single stream item), broadcast the stream to downstream operators, which will now have geofence data in their state as their slow changing data (processBroadcastElement), and the user location regularly coming to the operator (processElement).





---
Oytun Tez

M O T A W O R D
The World's Fastest Human Translation Platform.


On Thu, Jul 25, 2019 at 6:48 PM jaya sai <[hidden email]> wrote:
Hello,

I have a question on using flink, we have a small data set which does not change often but have another data set which we need to compare with it and it has lots of data

let say I have two collections geofence and locations in mongodb. Geofence collection does not change often and relatively small, but we have location data coming in at high amounts from clients and we need to calculate the goefence entry exits based on geofence and location data point.
For calculating the entry and exit we were thinking of using flink CEP. But our problem is sometimes geofence data changes and we need to update the in memory store of the flink somehow

we were thinking of bootstrapping the memory of flink processor by loading data on initial start and subscribe to kafaka topic to listen for geofence changes and re-pull the data
Is this a valid approach ?

Thank you,
Reply | Threaded
Open this post in threaded view
|

Re: question for handling db data

Oytun Tez
imagine an operator, ProcessFunction, it has 2 incoming data:
geofences via broadcast,
user location via normal data stream

geofence updates and user location updates will come separately into this single operator.

1)
when geofence update comes via broadcast, the operator will update its state with the new geofence rules. this.geofenceListState = myNewGeofenceListState
this happens in processBroadcastElement() method.

whenever geofence data is updated, it will come to this operator, into processBroadcastElement, and you will put the new geofence list into the operator's state.

2) 
when user location update comes to the operator, via regular stream, you will access this.geofenceListState and do your calculations and collect() whatever you need to collect at the end of computation.

regular stream comes, this time, to processElement() method.

-

geofence update will not affect the previously collected elements from processElement. but Flink will make sure all of instances of this operator in various task managers will get the same geofence update via processBroadcastElement(), no matter whether the operator is keyed.

your processElement, meaning your user location updates, will do its calculation via the latest geofence data from processBroadcastElement. when geofence data is updated, user location updates from that point on will use the new geofence data.

i hope this is more clear...






---
Oytun Tez

M O T A W O R D
The World's Fastest Human Translation Platform.


On Thu, Jul 25, 2019 at 7:08 PM jaya sai <[hidden email]> wrote:
Hi Oytun,

Thanks for the quick reply, will study more

so when we have a stream let say in kakfa for edit, delete and insert of geofences and add it to the flink broadcast downstream, what happens if the processing is taking place and we update the bounds of geofence ?

when will the new data or how the updates take place and any impact on the geofence events out come based on the location data ?

Thank you,



On Thu, Jul 25, 2019 at 5:53 PM Oytun Tez <[hidden email]> wrote:
Hi Jaya,


You'll still keep your geofence data as a stream (depending on the data and use case, maybe the whole list of geofence as a single stream item), broadcast the stream to downstream operators, which will now have geofence data in their state as their slow changing data (processBroadcastElement), and the user location regularly coming to the operator (processElement).





---
Oytun Tez

M O T A W O R D
The World's Fastest Human Translation Platform.


On Thu, Jul 25, 2019 at 6:48 PM jaya sai <[hidden email]> wrote:
Hello,

I have a question on using flink, we have a small data set which does not change often but have another data set which we need to compare with it and it has lots of data

let say I have two collections geofence and locations in mongodb. Geofence collection does not change often and relatively small, but we have location data coming in at high amounts from clients and we need to calculate the goefence entry exits based on geofence and location data point.
For calculating the entry and exit we were thinking of using flink CEP. But our problem is sometimes geofence data changes and we need to update the in memory store of the flink somehow

we were thinking of bootstrapping the memory of flink processor by loading data on initial start and subscribe to kafaka topic to listen for geofence changes and re-pull the data
Is this a valid approach ?

Thank you,