Memory management in Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory management in Flink

Emmanuel
Hello,

In Storm, when running a new topology, a new JVM is started and this takes some memory.

In my use case, I may need to run many different topologies.

How does it work in Flink? Is the Flink run command spinning a new JVM each time?
If I run multiple topologies it seems like the additional processes take a lot less memory, but it's not clear to me how that happens.

Can anyone comment on hwo memory is managed in Flink?

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Memory management in Flink

rmetzger0
Hi Emmanuel,

Flink is not starting new JVMs on the workers when submitting a new topology.
When starting Flink using the "start-cluster.sh" script, it will create the Flink cluster and its JVMs.

Your topologies are then started as Threads inside these JVMs. So the overhead per topology is actually quite low, because its only a few more threads which are started in the cluster.
Also, you can easily start multiple topologies next to each other, if they are not consuming too much memory and CPU resources.

Regarding memory management: Flink Streaming is operating directly on the JVM heap (unlike Flink batch). So the memory utilization heavily depends on the amount of data currently streamed through the system.
The used amount of heap space is capped by the "taskmanager.heap.mb" confiuration setting.


If you're only using Flink streaming (no batch), I would recommend setting the
"taskmanager.memory.fraction" setting to a very low value (0.1).
As I said earlier, Flinks batch layer is used "managed memory" inside the JVM to avoid OutOfMemory Exceptions and to have full control over the memory for spilling to disk etc.


Let me know if you want more details on these topics.

Best,
Robert



On Sat, Mar 14, 2015 at 2:02 AM, Emmanuel <[hidden email]> wrote:

>
> Hello,
>
> In Storm, when running a new topology, a new JVM is started and this takes some memory.
>
> In my use case, I may need to run many different topologies.
>
> How does it work in Flink? Is the Flink run command spinning a new JVM each time?
> If I run multiple topologies it seems like the additional processes take a lot less memory, but it's not clear to me how that happens.
>
> Can anyone comment on hwo memory is managed in Flink?
>
> Thanks
>