Parallel CEP

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel CEP

dhanuka.priyanath
Hi All,

Is there way to run CEP function parallel. Currently CEP run only sequentially

flink-CEP.png
.

Cheers,
Dhanuka

--
Nothing Impossible,Creativity is more important than knowledge.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

Dian Fu
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.

Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

dhanuka.priyanath
Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.

Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

Dian Fu
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned.

Regards,
Dian 

在 2019年1月24日,下午12:37,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

dhanuka.priyanath
Hi Dian,

Thanks for the explanation. Please find the screen shot and source code for above mention use case. And in main issue is though I use KeyedStream , parallelism not apply properly.
Only one host is processing messages.

Flink-CEP-Keyby.png

Cheers,
Dhanuka

On Thu, Jan 24, 2019 at 1:40 PM Dian Fu <[hidden email]> wrote:
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned.

Regards,
Dian 

在 2019年1月24日,下午12:37,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.




--
Nothing Impossible,Creativity is more important than knowledge.

FlinkCEP.java (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

Dian Fu
Hi Dhanuka,

Does the KeySelector of Event::getTriggerID generate the same key for all the inputs or only generate very few key values and these key values happen to be hashed to the same downstream operator? You can print the results of Event::getTriggerID to check if it's that case.

Regards,
Dian

在 2019年1月24日,下午2:08,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

Thanks for the explanation. Please find the screen shot and source code for above mention use case. And in main issue is though I use KeyedStream , parallelism not apply properly.
Only one host is processing messages.

<Flink-CEP-Keyby.png>

Cheers,
Dhanuka

On Thu, Jan 24, 2019 at 1:40 PM Dian Fu <[hidden email]> wrote:
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned.

Regards,
Dian 

在 2019年1月24日,下午12:37,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.




--
Nothing Impossible,Creativity is more important than knowledge.
<FlinkCEP.java>

Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

dhanuka.priyanath
In this example key will be same. I am using 1 million messages with same key for performance testing. But still I want to process them parallel. Can't I use Split function and get a SplitStream for that purpose?

On Thu, Jan 24, 2019 at 2:49 PM Dian Fu <[hidden email]> wrote:
Hi Dhanuka,

Does the KeySelector of Event::getTriggerID generate the same key for all the inputs or only generate very few key values and these key values happen to be hashed to the same downstream operator? You can print the results of Event::getTriggerID to check if it's that case.

Regards,
Dian

在 2019年1月24日,下午2:08,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

Thanks for the explanation. Please find the screen shot and source code for above mention use case. And in main issue is though I use KeyedStream , parallelism not apply properly.
Only one host is processing messages.

<Flink-CEP-Keyby.png>

Cheers,
Dhanuka

On Thu, Jan 24, 2019 at 1:40 PM Dian Fu <[hidden email]> wrote:
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned.

Regards,
Dian 

在 2019年1月24日,下午12:37,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.




--
Nothing Impossible,Creativity is more important than knowledge.
<FlinkCEP.java>



--
Nothing Impossible,Creativity is more important than knowledge.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel CEP

Dian Fu
I'm afraid you cannot do that. The inputs having the same key should be processed by the same CEP operator. Otherwise the results will be nondeterministic and also be wrong.

Regards,
Dian

在 2019年1月24日,下午2:56,dhanuka ranasinghe <[hidden email]> 写道:

In this example key will be same. I am using 1 million messages with same key for performance testing. But still I want to process them parallel. Can't I use Split function and get a SplitStream for that purpose?

On Thu, Jan 24, 2019 at 2:49 PM Dian Fu <[hidden email]> wrote:
Hi Dhanuka,

Does the KeySelector of Event::getTriggerID generate the same key for all the inputs or only generate very few key values and these key values happen to be hashed to the same downstream operator? You can print the results of Event::getTriggerID to check if it's that case.

Regards,
Dian

在 2019年1月24日,下午2:08,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

Thanks for the explanation. Please find the screen shot and source code for above mention use case. And in main issue is though I use KeyedStream , parallelism not apply properly.
Only one host is processing messages.

<Flink-CEP-Keyby.png>

Cheers,
Dhanuka

On Thu, Jan 24, 2019 at 1:40 PM Dian Fu <[hidden email]> wrote:
Whether using KeyedStream depends on the logic of your job, i.e, whether you are looking for patterns for some partitions, i.e, patterns for a particular user. If so, you should partition the input data before the CEP operator. Otherwise, the input data should not be partitioned.

Regards,
Dian 

在 2019年1月24日,下午12:37,dhanuka ranasinghe <[hidden email]> 写道:

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and only single flink host working while rest not contribute for processing . I will share the code and screenshot

Cheers 
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu <[hidden email] wrote:
Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日,上午10:18,dhanuka ranasinghe <[hidden email]> 写道:
>
> Hi All,
>
> Is there way to run CEP function parallel. Currently CEP run only sequentially
>
> <flink-CEP.png>
> .
>
> Cheers,
> Dhanuka
>
> --
> Nothing Impossible,Creativity is more important than knowledge.




--
Nothing Impossible,Creativity is more important than knowledge.
<FlinkCEP.java>



--
Nothing Impossible,Creativity is more important than knowledge.