Memory Limits: MiniCluster vs. Local Mode

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory Limits: MiniCluster vs. Local Mode

Dominik Bruhn
Hey,
for our CI/CD cycle I'd like to try out our Flink Jobs in an development
environment without running them against a huge EMR cluster (which is
what we do for production), so something like a standalone mode.

Until now, for this standalone running, I just started the job jar. As
the "env.execute()" is in the main-method, this works. I think this is
callled "Local Mode" by the Flink Devs. I packaged the whole thing in a
docker container so I have a deployable artefact.

The problem with that is, that the memory constraint seem to be
difficult to control: Setting Xmx and Xms for the job doesn't seem to
limit the memory. This is most likely due to flinks off-heap memory
allocation.

Now, I got as feedback that perhaps the MiniCluster is the way to go
instead of the "Local Mode".

My questions:
1. Is the MiniCluster better than the local mode? What are the use-cases
in which you would choose one over the other?
2. Is there an example how to use the MiniCluster? I see that I need a
JobGraph, how do I get one?
3. What are the tuning parameters to limit the memory consumption of the
MiniCluster (and maybe the local mode)?

Thanks for your help,
Dominik
Reply | Threaded
Open this post in threaded view
|

Re: Memory Limits: MiniCluster vs. Local Mode

Tzu-Li (Gordon) Tai
Hi Dominik,

AFAIK, the local mode executions create a mini cluster within the JVM to run the job.

Also, `MiniCluster` seems to be something FLIP-6 related, and since FLIP-6 is still work
in progress, I’m not entirely sure if it is viable at the moment. Right now, you should look
into using `LocalFlinkMiniCluster`.

In a lot of the Flink integration tests, we use a `LocalFlinkMiniCluster` to setup the test
cluster programatically, and instantiate an `StreamExecutionEnvironment` against that
mini cluster. That would probably be helpful for trying out your Flink jobs programatically
in your CI / CD cycles.

You can also take a look at some Flink test utilities such as
`StreamingMultipleProgramsTestBase`, which helps you to set up an environment
that allows you to submit multiple test jobs on a single `LocalFlinkMiniCluster`. For a
simple example on how to use it, you can take a look at the tests in the Elasticsearch
connector. The Flink Kafka connector tests also have a more complicated test
environment setup where jobs are submitted to the `LocalFlinkMiniCluster` using an
remote environment.

As for memory consumption configuration for the created `LocalFlinkMiniCluster`, I think
you should be able to tune it using the `Configuration` instance passed to it.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 12:27:53 AM, [hidden email] ([hidden email]) wrote:

Hey,
for our CI/CD cycle I'd like to try out our Flink Jobs in an development
environment without running them against a huge EMR cluster (which is
what we do for production), so something like a standalone mode.

Until now, for this standalone running, I just started the job jar. As
the "env.execute()" is in the main-method, this works. I think this is
callled "Local Mode" by the Flink Devs. I packaged the whole thing in a
docker container so I have a deployable artefact.

The problem with that is, that the memory constraint seem to be
difficult to control: Setting Xmx and Xms for the job doesn't seem to
limit the memory. This is most likely due to flinks off-heap memory
allocation.

Now, I got as feedback that perhaps the MiniCluster is the way to go
instead of the "Local Mode".

My questions:
1. Is the MiniCluster better than the local mode? What are the use-cases
in which you would choose one over the other?
2. Is there an example how to use the MiniCluster? I see that I need a
JobGraph, how do I get one?
3. What are the tuning parameters to limit the memory consumption of the
MiniCluster (and maybe the local mode)?

Thanks for your help,
Dominik