Jars uploaded to taskmanager are deleted but not free'ed by OS

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Jars uploaded to taskmanager are deleted but not free'ed by OS

Jeroen Steggink | knowsy
Hi,

I'm having some troubles running the Flink taskmanager in a Docker
container (OpenShift). The container's internal storage is filling up
because the deleted jar files in blob storage are probably still in use
and therefore resources are not free'ed.

We are using Apache Beam to start an Apache Flink process, so the Jars
are sent to Apache Flink everytime we fire a batch.

I enabled the debug logging, but I can't seem to find anything showing
these deletes. Maybe someone has an idea why resources are not free'ed?
I checked the blob store, and it indeed are the jars.

208875129    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/142 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\ (deleted)

208875130    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/143 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\ (deleted)

208875131    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/144 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\ (deleted)

208875132    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/145 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\ (deleted)

208875133    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/146 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\ (deleted)

208875134    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/147 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\ (deleted)

208875135    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/148 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\ (deleted)

208875136    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/149 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\ (deleted)

208875137    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/150 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\ (deleted)

208875138    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/151 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\ (deleted)

208875139    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/152 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\ (deleted)

208875140    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/153 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\ (deleted)

208875141    0 lr-x------   1 1000150000 root           64 Apr 18 12:58 /proc/1/fd/154 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\ (deleted)



There are several places in the code where the returned boolean of the
file delete is not read, so we have no clue if the file was deleted
succesfully. Maybe it can be changed to something like
java.nio.file.Files.delete to get an IOException when something goes
wrong.  Though this is not a solution, but it can make it more
transparent when things go wrong.

Thanks,
Jeroen

Reply | Threaded
Open this post in threaded view
|

Jars uploaded to jobmanager are deleted but not free'ed by OS

Jeroen Steggink | knowsy
Sorry, I meant the jobmanager, not the taskmanager.


On 18-Apr-18 15:44, Jeroen Steggink | knowsy wrote:

> Hi,
>
> I'm having some troubles running the Flink taskmanager in a Docker
> container (OpenShift). The container's internal storage is filling up
> because the deleted jar files in blob storage are probably still in
> use and therefore resources are not free'ed.
>
> We are using Apache Beam to start an Apache Flink process, so the Jars
> are sent to Apache Flink everytime we fire a batch.
>
> I enabled the debug logging, but I can't seem to find anything showing
> these deletes. Maybe someone has an idea why resources are not
> free'ed? I checked the blob store, and it indeed are the jars.
>
> 208875129    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/142 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\
> (deleted)
>
> 208875130    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/143 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\
> (deleted)
>
> 208875131    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/144 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\
> (deleted)
>
> 208875132    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/145 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\
> (deleted)
>
> 208875133    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/146 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\
> (deleted)
>
> 208875134    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/147 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\
> (deleted)
>
> 208875135    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/148 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\
> (deleted)
>
> 208875136    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/149 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\
> (deleted)
>
> 208875137    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/150 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\
> (deleted)
>
> 208875138    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/151 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\
> (deleted)
>
> 208875139    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/152 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\
> (deleted)
>
> 208875140    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/153 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\
> (deleted)
>
> 208875141    0 lr-x------   1 1000150000 root           64 Apr 18
> 12:58 /proc/1/fd/154 ->
> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\
> (deleted)
>
>
>
> There are several places in the code where the returned boolean of the
> file delete is not read, so we have no clue if the file was deleted
> succesfully. Maybe it can be changed to something like
> java.nio.file.Files.delete to get an IOException when something goes
> wrong.  Though this is not a solution, but it can make it more
> transparent when things go wrong.
>
> Thanks,
> Jeroen
>