Best Practices/Advice - Execution of jobs

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Best Practices/Advice - Execution of jobs

PedroMrChaves
This post was updated on .
Hello,

I'm trying to build a stream event correlation engine with Flink and I have some questions regarding the  execution of jobs.

In my architecture I need to have different sources of data, lets say for instance:
firewallStream= environment.addSource([FirewalLogsSource]);
proxyStream = environment.addSource([ProxyLogsSource]);

and for each of these sources, I need to apply a set of rules.
So lets say I have a job that has as a source the proxy stream data with the following rules:

//Abnormal Request Method
stream.[RuleLogic].addSink([output])
//Web Service on Non-Typical Port
stream.[RuleLogic].addSink([output])
//Possible Brute Force
stream.[RuleLogic].addSink([output])


These rules will probably scale to be in the order of 15 to 20 rules.

What is the best approach in this case:
1. Should I create 2 jobs one for each source and each job would have the 15-20 rules?
2. Should I split the rules into several jobs?
3. Other options?


Thank you and Regards,
Pedro Chaves.

Best Regards,
Pedro Chaves
Reply | Threaded
Open this post in threaded view
|

Re: Best Practices/Advice - Execution of jobs

Aljoscha Krettek
Hi Pedro,
I think it would be better to have two jobs and keep all the rules in one place. If it's not too many sources you might even consider having everything in one job so you don't have to duplicate the rules.

There's a tradeoff, though, if it becomes too much stuff then splitting up will be beneficial because it makes the jobs easier to maintain/monitor.

Cheers,
Aljoscha

On Wed, 2 Nov 2016 at 10:26 PedroMrChaves <[hidden email]> wrote:
Hello,

I'm trying to build a stream event correlation engine with Flink and I have
some questions regarding the for the execution of jobs.

In my architecture I need to have different sources of data, lets say for
instance:
/firewallStream= environment.addSource([FirewalLogsSource]);
proxyStream = environment.addSource([ProxyLogsSource]);
/
and for each of these sources, I need to apply a set of rules.
So lets say I have a job that has as a source the proxy stream data with the
following rules:

///Abnormal Request Method
stream.[RuleLogic].addSink([output])
//Web Service on Non-Typical Port
stream.[RuleLogic].addSink([output])
//Possible Brute Force
stream.[RuleLogic].addSink([output])/

These rules will probably scale to be in the order of 15 to 20 rules.

What is the best approach in this case:
1. Should I create 2 jobs one for each source and each job would have the
15-20 rules?
2. Should I split the rules into several jobs?
3. Other options?


Thank you and Regards,
Pedro Chaves.





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-Practices-Advice-Execution-of-jobs-tp9822.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Best Practices/Advice - Execution of jobs

PedroMrChaves
Thank you.
Best Regards,
Pedro Chaves