FlinkCEP for large key spaces and long timeouts between events

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

FlinkCEP for large key spaces and long timeouts between events

David Koch
Hello,

Is FlinkCEP applicable to large key spaces with potentially long timeouts between events that define a pattern? Ideally, without ridiculous hardware.

More concretely, we segment users (one key per user) based on sequences of events for that user.

A segment "Abandoned Cart" may be defined by adding items during a browsing session but no purchase event within the following 5 days. The number of users is between 1 and 10 million.

Is this type of segmentation scenario a viable use case for FlinkCEP?

We currently segment by building incremental profiles in ES which are then "matched against segment definition queries" using ES percolators. In short, we incur costs when interacting with ES.

Regards,

David


PS: Thanks for FlinkForward 2016, very interesting presentations and equally important excellent catering ;-)
Reply | Threaded
Open this post in threaded view
|

Re: FlinkCEP for large key spaces and long timeouts between events

Till Rohrmann
Hi David,

you should be able to solve this kind of problem with Flink's CEP library. The important thing here is to define a pattern interval length so that patterns can time out. Otherwise, you will end up accumulating state which is never purged. This will eventually cause an OOM exception.

How complex would a pattern be (how many stages, what kind of payload)? Depending on this, we should be able to estimate the resource requirements. Or you give it a try and see to how many machines you can minimize the cluster.

Great to hear that you enjoyed the conference :-)

Cheers,
Till

On Thu, Sep 15, 2016 at 6:13 PM, David Koch <[hidden email]> wrote:
Hello,

Is FlinkCEP applicable to large key spaces with potentially long timeouts between events that define a pattern? Ideally, without ridiculous hardware.

More concretely, we segment users (one key per user) based on sequences of events for that user.

A segment "Abandoned Cart" may be defined by adding items during a browsing session but no purchase event within the following 5 days. The number of users is between 1 and 10 million.

Is this type of segmentation scenario a viable use case for FlinkCEP?

We currently segment by building incremental profiles in ES which are then "matched against segment definition queries" using ES percolators. In short, we incur costs when interacting with ES.

Regards,

David


PS: Thanks for FlinkForward 2016, very interesting presentations and equally important excellent catering ;-)