(DEPRECATED) Apache Flink User Mailing List archive.

Best Practices/Advice - Execution of jobs

Classic

List

Threaded

3 messages Options

PedroMrChaves

Best Practices/Advice - Execution of jobs

This post was updated on .

Hello,

I'm trying to build a stream event correlation engine with Flink and I have some questions regarding the execution of jobs.

In my architecture I need to have different sources of data, lets say for instance:
firewallStream= environment.addSource([FirewalLogsSource]);
proxyStream = environment.addSource([ProxyLogsSource]);

and for each of these sources, I need to apply a set of rules.
So lets say I have a job that has as a source the proxy stream data with the following rules:

//Abnormal Request Method
stream.[RuleLogic].addSink([output])
//Web Service on Non-Typical Port
stream.[RuleLogic].addSink([output])
//Possible Brute Force
stream.[RuleLogic].addSink([output])

These rules will probably scale to be in the order of 15 to 20 rules.

What is the best approach in this case:
1. Should I create 2 jobs one for each source and each job would have the 15-20 rules?
2. Should I split the rules into several jobs?
3. Other options?

Thank you and Regards,
Pedro Chaves.

Best Regards,
Pedro Chaves

Aljoscha Krettek

Re: Best Practices/Advice - Execution of jobs

Hi Pedro,

I think it would be better to have two jobs and keep all the rules in one place. If it's not too many sources you might even consider having everything in one job so you don't have to duplicate the rules.

There's a tradeoff, though, if it becomes too much stuff then splitting up will be beneficial because it makes the jobs easier to maintain/monitor.

Cheers,

Aljoscha

On Wed, 2 Nov 2016 at 10:26 PedroMrChaves <[hidden email]> wrote:

Hello,

I'm trying to build a stream event correlation engine with Flink and I have
some questions regarding the for the execution of jobs.

In my architecture I need to have different sources of data, lets say for
instance:
/firewallStream= environment.addSource([FirewalLogsSource]);
proxyStream = environment.addSource([ProxyLogsSource]);
/
and for each of these sources, I need to apply a set of rules.
So lets say I have a job that has as a source the proxy stream data with the
following rules:

///Abnormal Request Method
stream.[RuleLogic].addSink([output])
//Web Service on Non-Typical Port
stream.[RuleLogic].addSink([output])
//Possible Brute Force
stream.[RuleLogic].addSink([output])/

These rules will probably scale to be in the order of 15 to 20 rules.

What is the best approach in this case:
1. Should I create 2 jobs one for each source and each job would have the
15-20 rules?
2. Should I split the rules into several jobs?
3. Other options?

Thank you and Regards,
Pedro Chaves.

--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-Practices-Advice-Execution-of-jobs-tp9822.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

PedroMrChaves

Re: Best Practices/Advice - Execution of jobs

Thank you.

Best Regards,
Pedro Chaves