Local blobStore not freed

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Local blobStore not freed

Dede
Hi Team,

I'm struggling for a while with a strange issue: the local blob store files are not actually deleted from the job manager/task manager in versions 1.7.2 and 1.8.0 : lsof commands shows them like this:
java    6528          root   63r   REG 202,16 162458786 1802248 /mnt/tmp1/blobStore-542fc202-b263-482f-87d7-11b5ad70cc32/job_b3446b24474ac3e107bbde27ff24df98/blob_p-96a29e8796d15ce6359edb4ab80ff8661f8b1fd0-73395221d4ffcfd603dbd1d25961aee3 (deleted)

The files are removed after a restart of the process, so I guess the flink itself is responsible for keeping a handle to the deleted file.

The same process works fine on 1.4.2, deleting all the files properly.

Is there something that I'm missing? I played around with the blob server configuration, but with no luck.

I can provide more logs/debug if needed.

Thanks,
Dan


Reply | Threaded
Open this post in threaded view
|

Re: Local blobStore not freed

tison
Hi Dan,

Said "The files are removed after a restart of the process", it sounds Flink
cleaned up blobs properly. From your description I don't understand clearly
in which case/situation you expected Flink deleted blobs but it doesn't.

Could you describe the difference between 1.4.2 and 1.7.2/1.8.0 in detail?
Especially what moment exactly you want blobs to be deleted and it was not.

Best,
tison.


Dede <[hidden email]> 于2019年6月11日周二 下午11:06写道:
Hi Team,

I'm struggling for a while with a strange issue: the local blob store files are not actually deleted from the job manager/task manager in versions 1.7.2 and 1.8.0 : lsof commands shows them like this:
java    6528          root   63r   REG 202,16 162458786 1802248 /mnt/tmp1/blobStore-542fc202-b263-482f-87d7-11b5ad70cc32/job_b3446b24474ac3e107bbde27ff24df98/blob_p-96a29e8796d15ce6359edb4ab80ff8661f8b1fd0-73395221d4ffcfd603dbd1d25961aee3 (deleted)

The files are removed after a restart of the process, so I guess the flink itself is responsible for keeping a handle to the deleted file.

The same process works fine on 1.4.2, deleting all the files properly.

Is there something that I'm missing? I played around with the blob server configuration, but with no luck.

I can provide more logs/debug if needed.

Thanks,
Dan


Reply | Threaded
Open this post in threaded view
|

Re: Local blobStore not freed

Dede
Thanks Tison for looking into it - what I  tried to say is that Flink keeps the files locked (hence, the space is still occupied) - this is visible during a lsof command

From my investigations, after the job finishes, the local (and HA) blob store is deleted - the operation succeed in both case, but on our Linux machine, the space remains occupied until the process is restarted.

I saw there were some improvement around this feature (https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture) , but from my debugging, the responsible code is in   
org.apache.flink.util.FileUtils.deleteFileOrDirectoryInternal(File file)
which uses the new java class
java.nio.file.deleteIfExists(Path path)
where there is a comment 

     * <p> On some operating systems it may not be possible to remove a file when
     * it is open and in use by this Java virtual machine or other programs.
     *

There are a lot of changes compared with Flink 1.4, but i think the correspondent is below (pretty sure there are more things involved)
try {
        Files.delete(directory.toPath());
      }
      catch (NoSuchFileException ignored) {
        // if someone else deleted this concurrently, we don't mind
        // the result is the same for us, after all
      }


Thanks again for your time

Dan



 

On Wed, Jun 12, 2019 at 4:36 AM Zili Chen <[hidden email]> wrote:
Hi Dan,

Said "The files are removed after a restart of the process", it sounds Flink
cleaned up blobs properly. From your description I don't understand clearly
in which case/situation you expected Flink deleted blobs but it doesn't.

Could you describe the difference between 1.4.2 and 1.7.2/1.8.0 in detail?
Especially what moment exactly you want blobs to be deleted and it was not.

Best,
tison.


Dede <[hidden email]> 于2019年6月11日周二 下午11:06写道:
Hi Team,

I'm struggling for a while with a strange issue: the local blob store files are not actually deleted from the job manager/task manager in versions 1.7.2 and 1.8.0 : lsof commands shows them like this:
java    6528          root   63r   REG 202,16 162458786 1802248 /mnt/tmp1/blobStore-542fc202-b263-482f-87d7-11b5ad70cc32/job_b3446b24474ac3e107bbde27ff24df98/blob_p-96a29e8796d15ce6359edb4ab80ff8661f8b1fd0-73395221d4ffcfd603dbd1d25961aee3 (deleted)

The files are removed after a restart of the process, so I guess the flink itself is responsible for keeping a handle to the deleted file.

The same process works fine on 1.4.2, deleting all the files properly.

Is there something that I'm missing? I played around with the blob server configuration, but with no luck.

I can provide more logs/debug if needed.

Thanks,
Dan