local disk cleanup after crash

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

local disk cleanup after crash

Derek VerLee

I think that effort is put in to have task managers clean up their folders, however I have noticed that in some cases local folders are not cleaned up and can build up, eventually causing problems due to a full disk.  As far as I know this only happens with crashes and other out-of-happy-path scenarios.

I am thinking of writing a script to clean up local folders that runs before task-manager starts between restarts in the case of a crash.

Assuming local recovery is not configured, what should I delete and what should I leave around?

What should I keep if local recovery is configured?


Under the "taskmanager.tmp.dirs" I see:

blobStore-*
flink-dist-cache-*
flink-io-*
localState/*
rocksdb-lib-*


Thanks

Reply | Threaded
Open this post in threaded view
|

Re: local disk cleanup after crash

Gary Yao-4
Hi,

If no other TaskManager (TM) is running, you can delete everything. If
multiple TMs share the same host, as far as I know, you will have to parse TM
logs to know what directories you can delete [1]. As for local recovery, tasks
that were running on a crashed TM are lost. From the documentation [2]:

    If a task manager is lost, the local state from all its task is lost.

Therefore, assuming that only one TM is running on each host, you can delete
everything.

Best,
Gary

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-are-blobstore-files-and-why-do-they-keep-filling-up-tmp-directory-td26323.html
[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery

On Thu, Mar 7, 2019 at 10:45 PM Derek VerLee <[hidden email]> wrote:

I think that effort is put in to have task managers clean up their folders, however I have noticed that in some cases local folders are not cleaned up and can build up, eventually causing problems due to a full disk.  As far as I know this only happens with crashes and other out-of-happy-path scenarios.

I am thinking of writing a script to clean up local folders that runs before task-manager starts between restarts in the case of a crash.

Assuming local recovery is not configured, what should I delete and what should I leave around?

What should I keep if local recovery is configured?


Under the "taskmanager.tmp.dirs" I see:

blobStore-*
flink-dist-cache-*
flink-io-*
localState/*
rocksdb-lib-*


Thanks