CEP and KeyedStreams doubt

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

CEP and KeyedStreams doubt

Oriol
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: CEP and KeyedStreams doubt

Kostas Kloudas
Hi Oriol,

The number of keys is related to the number of data-structures (NFAs) Flink is going to create and keep.
Given this, it may make sense to try to reduce your key-space (or your keyedStreams). Other than that, Flink
has not issue handling large numbers of keys.

Now, for the issue you mentioned, we hope to get it fixed soon but there is no concrete horizon yet.

Hope this helps!

Let us know if you have any issues,
Kostas

On Jan 26, 2017, at 1:04 PM, Oriol <[hidden email]> wrote:

Hello everyone, 
 
I'm using the CEP library for event stream processing. 
 
I'm splitting the dataStream into different KeyedStreams using keyBy(). In the KeyBy, I'm using a tuple of two elements, which means I may have several millions of KeyedStreams, as I need to monitor all our customer's users. 
 
Is this the preferred way to use Flink, or should I find a way to reduce the number of KeyedStreams, for example having one per customer instead of one per customer's user? (And find a way later to process each user by itself).
 
Also, is the bug reported in https://issues.apache.org/jira/browse/FLINK-5174 related to keys of the KeyedStreams? I'm not sure what kind of keys it is related to. If so, is it going to be addressed soon?

Thanks,
Oriol.