Many streaming jobs vs one

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Many streaming jobs vs one

Tarandeep Singh
Hi,

I am running a flink cluster to process clickstream data  (generate user level, page level, site level statistics)

I want to understand the cons and pros of submitting multiple jobs (each job handles one simple processing/computation) vs one/few complex jobs.  At present, the events are read from single Kafka topic (in future it can change and I can have multiple topics). Here are my thoughts:

Multiple simple jobs: Failure/bugs in one job won't impact other computations. It can be stopped independently of others. 

Are there any overheads or performance penalties? If there are none and this is the recommended way, then I have a follow up question - Can I update the jar without stopping all flink streaming jobs? I mean stop job the job that has a bug (leave others running), replace jar [that contains *all* jobs code] and then restart the stopped job.

Thanks,
Tarandeep 
Reply | Threaded
Open this post in threaded view
|

Re: Many streaming jobs vs one

Jonas Gröger
I recommend multiple Jobs. You can still share most of the code by creating Java / Scala packages. THis makes it easier to update Jobs.
Reply | Threaded
Open this post in threaded view
|

Re: Many streaming jobs vs one

Ufuk Celebi
As Jonas said, for job upgrades having a single job that "multiplexes"
multiple jobs means that all jobs will be offline at the same time. If
all jobs share a single Flink cluster, it should be fine to use
multiple jobs that share the resources. A down side of this will be
that managing multiple jobs is probably harder than managing just a
single job (keeping track of the state, monitoring, etc.)

On Sun, Feb 5, 2017 at 10:43 PM, Jonas <[hidden email]> wrote:
> I recommend multiple Jobs. You can still share most of the code by creating
> Java / Scala packages. THis makes it easier to update Jobs.
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Many-streaming-jobs-vs-one-tp11449p11450.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.