Hi there, Today I realized that we currently have a lot of not housekept flink distribution jar files and would like to know what to do about this, i.e. how to proper housekeep them. In the job submitting HDFS home directory, I find a subdirectory called `.flink` with hundreds of subfolders like `application_1573731655031_0420`, having the following structure: -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 /user/dev/.flink/application_1580155950981_0010/4797ff6e-853b-460c-81b3-34078814c5c9-taskmanager-conf.yaml -rw-r--r-- 3 dev dev 691 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/application_1580155950981_0010-flink-conf.yaml2755466919863419496.tmp -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 /user/dev/.flink/application_1580155950981_0010/fdb5ef57-c140-4f6d-9791-c226eb1438ce-taskmanager-conf.yaml -rw-r--r-- 3 dev dev 92.2 M 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/flink-dist_2.11-1.9.1.jar drwxr-xr-x - dev dev 0 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/lib -rw-r--r-- 3 dev dev 2.6 K 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/log4j.properties -rw-r--r-- 3 dev dev 2.3 K 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/logback.xml drwxr-xr-x - dev dev 0 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/plugins With having tons of those folders (For each flink session we launched/killed in our CI CD pipeline), they sum up to some terrabytes in our HDFS in used space. I suppose, I kill our flink sessions wrongly. We start and stop sessions and and jobs separately like so: Start: ${OS_ROOT}/flink/bin/yarn-session.sh -jm 4g -tm 32g --name "${FLINK_SESSION_NAME}" -d -Denv.java.opts="-XX:+HeapDumpOnOutOfMemoryError" ${OS_ROOT}/flink/bin/flink run -m ${FLINK_HOST} [..savepoint/checkpoint options...] -d -n "${JOB_JAR}" $* Stop ${OS_ROOT}/flink/bin/flink stop -p ${SAVEPOINT_BASEDIR}/${FLINK_JOB_NAME} -m ${FLINK_HOST} ${ID} yarn application -kill "${ID}" yarn application -kill was the best I could find as the flink docu states, the linux session process should only be closed (" Stop the YARN session by stopping the unix process (using CTRL+C) or by entering ‘stop’ into the client."). Now my question: Is there a more elegant way to kill a yarn session (remotely from some host in the cluster, not necessarily the one starting the detached session), which also does the housekeeping then? Or should I do the housekeeping myself manually? (Pretty easy to script). Do I need to expect any more side effects when killing the session with "yarn application -kill"? Best regards Theo -- SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln Theo Diefenthal T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575 [hidden email] - www.scoop-software.de Sitz der Gesellschaft: Köln, Handelsregister: Köln, Handelsregisternummer: HRB 36625 Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen, Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel |
Hi Theo, your assumption is correct that Flink won't clean up its files when using `yarn application -kill ID`. This should also hold true for other temporary files generated by Flink's Blob service, shuffle service and io manager. These files are usually stored under /tmp and should be cleaned up eventually, though. I think a better approach is to reconnect to the Flink Yarn session cluster and then issue the "stop" command. You can either do it via `bin/yarn-session.sh -id APP_ID` and then type "stop" or you do `echo "stop" | bin/yarn-session.sh -id APP_ID`. I think we should also update the logging statements of the yarn-session.sh which say that you should use `yarn application -kill` in order to stop the process. Cheers, Till On Tue, Jan 28, 2020 at 6:21 PM Theo Diefenthal <[hidden email]> wrote:
|
Here is the corresponding JIRA ticket: https://issues.apache.org/jira/browse/FLINK-15806 On Wed, Jan 29, 2020 at 3:16 PM Till Rohrmann <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |