Fwd: Hi Flink Team

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Hi Flink Team

Ashish Attarde
Hi,

I am new to Flink and in general data processing using stream processors.

I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I am still seeing apply operation is applied with parallelism factor of 1.

Here is the peice of code :

parsedInput.keyBy("mflowHash")
.timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
.allowedLateness(Time.seconds(10))
.apply(new CRWindow());

I am trying to correlate 2 streams, what is the right way to do it? I tried the CEP library and experienced the worst performance. It is taking ~4 minutes to do the correlation. The corelation logic is very simple and not compute intensive.


--

Thanks
-Ashish Attarde



--

Thanks
-Ashish Attarde
Reply | Threaded
Open this post in threaded view
|

Re: Hi Flink Team

Piotr Nowojski
Hi,

timeWindowAll is a non parallel operation, since it gathers all of the elements and process them together:


Note that it’s defined in DataStream, not in the KeyedStream.

In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use KeyedStream#timeWindows method?

Piotrek

On 1 Mar 2018, at 09:21, Ashish Attarde <[hidden email]> wrote:

Hi,

I am new to Flink and in general data processing using stream processors.

I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I am still seeing apply operation is applied with parallelism factor of 1.

Here is the peice of code :

parsedInput.keyBy("mflowHash")
.timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
.allowedLateness(Time.seconds(10))
.apply(new CRWindow());

I am trying to correlate 2 streams, what is the right way to do it? I tried the CEP library and experienced the worst performance. It is taking ~4 minutes to do the correlation. The corelation logic is very simple and not compute intensive.


--

Thanks
-Ashish Attarde



--

Thanks
-Ashish Attarde

Reply | Threaded
Open this post in threaded view
|

Re: Hi Flink Team

Ashish Attarde
Thanks Piotrek for your response. Teena responsed for same. I am implementing changes to try it out.

Yes, Originally I did call keyBy for same reason so that I can parallelize the operation.

On Thu, Mar 1, 2018 at 1:24 AM, Piotr Nowojski <[hidden email]> wrote:
Hi,

timeWindowAll is a non parallel operation, since it gathers all of the elements and process them together:


Note that it’s defined in DataStream, not in the KeyedStream.

In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use KeyedStream#timeWindows method?

Piotrek

On 1 Mar 2018, at 09:21, Ashish Attarde <[hidden email]> wrote:

Hi,

I am new to Flink and in general data processing using stream processors.

I am using flink to do real time correlation between multiple records which are coming as part of same stream. I am doing is "apply" operation on TimeWindowed stream. When I submit job with parallelism factor of 4, I am still seeing apply operation is applied with parallelism factor of 1.

Here is the peice of code :

parsedInput.keyBy("mflowHash")
.timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200))
.allowedLateness(Time.seconds(10))
.apply(new CRWindow());

I am trying to correlate 2 streams, what is the right way to do it? I tried the CEP library and experienced the worst performance. It is taking ~4 minutes to do the correlation. The corelation logic is very simple and not compute intensive.


--

Thanks
-Ashish Attarde



--

Thanks
-Ashish Attarde




--

Thanks
-Ashish Attarde