Hi,
I'm having some troubles running the Flink taskmanager in a Docker container (OpenShift). The container's internal storage is filling up because the deleted jar files in blob storage are probably still in use and therefore resources are not free'ed. We are using Apache Beam to start an Apache Flink process, so the Jars are sent to Apache Flink everytime we fire a batch. I enabled the debug logging, but I can't seem to find anything showing these deletes. Maybe someone has an idea why resources are not free'ed? I checked the blob store, and it indeed are the jars. 208875129 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/142 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\ (deleted) 208875130 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/143 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\ (deleted) 208875131 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/144 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\ (deleted) 208875132 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/145 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\ (deleted) 208875133 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/146 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\ (deleted) 208875134 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/147 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\ (deleted) 208875135 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/148 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\ (deleted) 208875136 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/149 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\ (deleted) 208875137 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/150 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\ (deleted) 208875138 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/151 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\ (deleted) 208875139 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/152 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\ (deleted) 208875140 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/153 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\ (deleted) 208875141 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58 /proc/1/fd/154 -> /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\ (deleted) There are several places in the code where the returned boolean of the file delete is not read, so we have no clue if the file was deleted succesfully. Maybe it can be changed to something like java.nio.file.Files.delete to get an IOException when something goes wrong. Though this is not a solution, but it can make it more transparent when things go wrong. Thanks, Jeroen |
Sorry, I meant the jobmanager, not the taskmanager.
On 18-Apr-18 15:44, Jeroen Steggink | knowsy wrote: > Hi, > > I'm having some troubles running the Flink taskmanager in a Docker > container (OpenShift). The container's internal storage is filling up > because the deleted jar files in blob storage are probably still in > use and therefore resources are not free'ed. > > We are using Apache Beam to start an Apache Flink process, so the Jars > are sent to Apache Flink everytime we fire a batch. > > I enabled the debug logging, but I can't seem to find anything showing > these deletes. Maybe someone has an idea why resources are not > free'ed? I checked the blob store, and it indeed are the jars. > > 208875129 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/142 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\ > (deleted) > > 208875130 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/143 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\ > (deleted) > > 208875131 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/144 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\ > (deleted) > > 208875132 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/145 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\ > (deleted) > > 208875133 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/146 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\ > (deleted) > > 208875134 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/147 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\ > (deleted) > > 208875135 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/148 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\ > (deleted) > > 208875136 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/149 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\ > (deleted) > > 208875137 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/150 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\ > (deleted) > > 208875138 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/151 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\ > (deleted) > > 208875139 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/152 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\ > (deleted) > > 208875140 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/153 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\ > (deleted) > > 208875141 0 lr-x------ 1 1000150000 root 64 Apr 18 > 12:58 /proc/1/fd/154 -> > /var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\ > (deleted) > > > > There are several places in the code where the returned boolean of the > file delete is not read, so we have no clue if the file was deleted > succesfully. Maybe it can be changed to something like > java.nio.file.Files.delete to get an IOException when something goes > wrong. Though this is not a solution, but it can make it more > transparent when things go wrong. > > Thanks, > Jeroen > |
Free forum by Nabble | Edit this page |