This post was updated on .
We are evaluating a use-case where there will be 100s of events stream coming
in per second and we want to run some fixed set of pattern matching rules on them And I use relaxed contiguity rules as described in the documentation. for example : a pattern sequence "a b+ c" on the stream of "a", "b1", "d1", "b2", "d2", "b3" "c" will have results as -- {a b1 c}, {a b1 b2 c}, {a b1 b2 b3 c}, {a b2 c}, {a b2 b3 c}, {a b3 c} and I specify time window to be 60 mins using within() clause for this pattern. Does this mean that the events which don't match i.e. "d1","d2" won't be stored in state at all ? does the CEP store only matching events in the state for 60 minutes ? This question is important to estimate the state backend size required for the usecase and to make sure that the application doesn't go out of memory due to ever increasing state. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi,
Yes you are correct that if an event can not match any pattern it won't be stored in state. If you process your records in event time it might be stored for a little while before processing in order to sort the incoming records based on time. Once a Watermark with a higher timestamp comes it will be processed and if it does not match it will be discarded and it won't be stored any longer. Best, Dawid On 21/04/2021 02:44, tbud wrote: > We are evaluating a use-case where there will be 100s of events stream coming > in per second and we want to run some fixed set of pattern matching rules on > them And I use relaxed contiguity rules as described in the documentation. > for example : > /a pattern sequence "a b+ c" on the stream of "a", "b1", "d1", "b2", "d2", > "b3" "c" will have results as -- {a b1 c}, {a b1 b2 c}, {a b1 b2 b3 c}, {a > b2 c}, {a b2 b3 c}, {a b3 c} > and I specify time window to be 60 mins using within() clause for this > pattern. > / > Does this mean that the events which don't match i.e. "d2" won't be stored > in state at all ? does the CEP store only matching events in the state for > 60 minutes ? > > This question is important to estimate the state backend size required for > the usecase and to make sure that the application doesn't go out of memory > due to ever increasing state. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ OpenPGP_signature (855 bytes) Download Attachment |
Thanks Dawid. As I see in the code the buffered storage in between watermarks
is stored in /MapState<Long, List<IN>> elementQueueState /variable in /class CepOperator/. My question is, if we use rocksDb or some other state backend then would this state be stored on that and checkpointed ? or is it always in the heap for faster access ? Also since this storage is between the watermarks, in a high volume application a watermarking strategy becomes important, is that correct assumption ? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |