Contiguity and state storage in CEP library

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Contiguity and state storage in CEP library

Tejas
This post was updated on .
We are evaluating a use-case where there will be 100s of events stream coming
in per second and we want to run some fixed set of pattern matching rules on
them And I use relaxed contiguity rules as described in the documentation.
for example :
a pattern sequence "a b+ c" on the stream of "a", "b1", "d1", "b2", "d2",
"b3" "c" will have results as -- {a b1 c}, {a b1 b2 c}, {a b1 b2 b3 c}, {a
b2 c}, {a b2 b3 c}, {a b3 c}
and I specify time window to be 60 mins using within() clause for this
pattern.


Does this mean that the events which don't match i.e. "d1","d2" won't be stored
in state at all ? does the CEP store only matching events in the state for
60 minutes ?

This question is important to estimate the state backend size required for
the usecase and to make sure that the application doesn't go out of memory
due to ever increasing state.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Contiguity and state storage in CEP library

Dawid Wysakowicz-2
Hi,

Yes you are correct that if an event can not match any pattern it won't
be stored in state. If you process your records in event time it might
be stored for a little while before processing in order to sort the
incoming records based on time. Once a Watermark with a higher timestamp
comes it will be processed and if it does not match it will be discarded
and it won't be stored any longer.

Best,

Dawid

On 21/04/2021 02:44, tbud wrote:

> We are evaluating a use-case where there will be 100s of events stream coming
> in per second and we want to run some fixed set of pattern matching rules on
> them And I use relaxed contiguity rules as described in the documentation.
> for example :
> /a pattern sequence "a b+ c" on the stream of "a", "b1", "d1", "b2", "d2",
> "b3" "c" will have results as -- {a b1 c}, {a b1 b2 c}, {a b1 b2 b3 c}, {a
> b2 c}, {a b2 b3 c}, {a b3 c}
> and I specify time window to be 60 mins using within() clause for this
> pattern.
> /
> Does this mean that the events which don't match i.e. "d2" won't be stored
> in state at all ? does the CEP store only matching events in the state for
> 60 minutes ?
>
> This question is important to estimate the state backend size required for
> the usecase and to make sure that the application doesn't go out of memory
> due to ever increasing state.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


OpenPGP_signature (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Contiguity and state storage in CEP library

Tejas
Thanks Dawid. As I see in the code the buffered storage in between watermarks
is stored in /MapState<Long, List&lt;IN>> elementQueueState /variable in
/class CepOperator/. My question is, if we use rocksDb or some other state
backend then would this state be stored on that and checkpointed ? or is it
always in the heap for faster access ?
Also since this storage is between the watermarks, in a high volume
application a watermarking strategy becomes important, is that correct
assumption ?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/