memory flush on cluster

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

memory flush on cluster

Pa Rö
hi flink community,

to time i test my flink app with a benchmark on an hadoop cluster (flink on yarn).
my results show me that flink need for the first round more time as all other rounds. maybe flink cache something in memory? and if i run the benchmark 100 rounds my system freeze, i think the memory is full. give it a way to flush the memory after the execution?

best regards,
paul
Reply | Threaded
Open this post in threaded view
|

Re: memory flush on cluster

Stephan Ewen
Currently, Flink does not cache anything across runs, except JAR files on the workers.

The reason the first run is slower may be:
 - Because in the first run, code is distributed in the cluster. In subsequent runs, the JAR files need not be redistributed.
 - Because the JIT takes a bit to kick in and compile code in the first run. In subsequent runs, the code is already JIT-ted.


The system should not freeze after 100 runs. Can you tell us a bit more of what you see? Can you identify which process hangs and send us a stack-trace of that one? Then we could look into this...



On Tue, Jun 23, 2015 at 10:56 AM, Pa Rö <[hidden email]> wrote:
hi flink community,

to time i test my flink app with a benchmark on an hadoop cluster (flink on yarn).
my results show me that flink need for the first round more time as all other rounds. maybe flink cache something in memory? and if i run the benchmark 100 rounds my system freeze, i think the memory is full. give it a way to flush the memory after the execution?

best regards,
paul

Reply | Threaded
Open this post in threaded view
|

Re: memory flush on cluster

Ufuk Celebi

On 23 Jun 2015, at 13:53, Stephan Ewen <[hidden email]> wrote:

> Currently, Flink does not cache anything across runs, except JAR files on the workers.
>
> The reason the first run is slower may be:
>  - Because in the first run, code is distributed in the cluster. In subsequent runs, the JAR files need not be redistributed.
>  - Because the JIT takes a bit to kick in and compile code in the first run. In subsequent runs, the code is already JIT-ted.
>
>
> The system should not freeze after 100 runs. Can you tell us a bit more of what you see? Can you identify which process hangs and send us a stack-trace of that one? Then we could look into this...

If you have access to the task manager instances, you can do a `jps` to get the PID of the task manager and then you can do `jstack PID`.

$ jps
16242 Jps
89107 TaskManager
$ jstack 89107
[stack trace]

Would be great if you could share this after the task managers freeze.

- Can you also provide some information on your setup (what job? how many task managers? etc.) so that I can try to reproduce this?