Many topologies

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Many topologies

Jack
Hello,

I need to implement an engine for running a large number (several hundred) of different kinds of topologies. The topologies are defined ad-hoc by customers, hence combining the topologies is difficult.

How does Flink perform with so many topologies?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Many topologies

Stephan Ewen
Hi Jack!

Interesting use case. I think the answer depends a bit on how you want to make your setup:

Variant 1:
-------------
If you have a YARN setup and allocate a new Flink AppMaster and dedicated workers, you can probably scale to as many topologies as you want (or as many as YARN can handle).


Variant 2:
--------------
If you want to let one (or a few) Flink clusters (deployed in YARN or standalone) handle all the concurrent topologies (because they are short lived, or because you want to minimize the overhead per topology), then it becomes interesting

Resources of the master: Make sure you have a cap with respect to how many jobs you run on one master node. The master keeps a representation of the distributed dataflow, including metadata for the individual streams. For high parallelism, that may require a bit of memory. So for large parallelisms (100s and more parallel instances) I would not have more than 10 jobs per master. For lower parallelism, you can try more on one master node. You can easily have multiple Flink clusters on one YARN cluster, that way you should be able to scale to many concurrent topologies.

Resources on the workers: You can go for many small workers, or for few large workers.

If you add many workers, that way you can handle many topologies and have their operations isolated on JVM level. A worker itself can be rather lightweight, it does not need too much heap space. For streaming, you need to mainly make sure that there is enough network memory for streaming shuffles, and if you want to make many very small workers you may want to set the number of akka threads down to one.

If you go for fewer big workers that each run multiple operators, that is probably more resource efficient, but will also not isolate operations on a process level.

In any case, make sure you start the workers properly in streaming mode, it has an effect on how the managed memory is pre-allocated.

Greetings,
Stephan


On Wed, Sep 23, 2015 at 12:24 PM, Jack <[hidden email]> wrote:
Hello,

I need to implement an engine for running a large number (several hundred) of different kinds of topologies. The topologies are defined ad-hoc by customers, hence combining the topologies is difficult.

How does Flink perform with so many topologies?

Thanks