Hi,
I am new to Flink and in general data processing using stream processors. I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I am still seeing apply operation is applied with parallelism factor of 1. Here is the peice of code : parsedInput.keyBy("mflowHash") I am trying to correlate 2 streams, what is the right way to do it? I tried the CEP library and experienced the worst performance. It is taking ~4 minutes to do the correlation. The corelation logic is very simple and not compute intensive. Thanks -Ashish Attarde Thanks -Ashish Attarde |
Hi,
timeWindowAll is a non parallel operation, since it gathers all of the elements and process them together: Note that it’s defined in DataStream, not in the KeyedStream. In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use KeyedStream#timeWindows method? Piotrek
|
Thanks Piotrek for your response. Teena responsed for same. I am implementing changes to try it out. Yes, Originally I did call keyBy for same reason so that I can parallelize the operation. On Thu, Mar 1, 2018 at 1:24 AM, Piotr Nowojski <[hidden email]> wrote:
Thanks -Ashish Attarde |
Free forum by Nabble | Edit this page |