Hi all, We're running Flink on a standalone five node cluster. The /tmp/ directory keeps filling with directories starting with blobstore--*. These directories are very large (approx 1 GB) and fill up the space very
quickly and the jobs fail with a No space left of device error. The files in these directories appear to be some form of binary representation of the jobs that are running on the cluster. What are these files and how do I take care of cleaning them so they don't fill up /tmp/ causing jobs to fail? Flink version: 1.4.2 Thanks, Harshith |
Hi Harshith, the blob store files are necessary to distribute the Flink job in your cluster. After the job has been completed, they should be cleaned up. Only in the case of cluster crashes the clean up should not happen. Since Flink 1.4.2 is no longer actively supported, I would suggest to upgrade to the latest Flink version and to check whether the problem still occurs. Cheers, Till On Tue, Feb 26, 2019 at 2:48 AM Kumar Bolar, Harshith <[hidden email]> wrote:
|
Thanks Till, It appears to occur when a task manager crashes and restarts – A new blob-store directory gets created and the old one remains as is, and this piles up over time. Should these *old* blob-stores
be manually cleared every time a task manager crashes and restarts? Regards, Harshith From: Till Rohrmann <[hidden email]> Hi Harshith, the blob store files are necessary to distribute the Flink job in your cluster. After the job has been completed, they should be cleaned up. Only in the case of cluster crashes the clean up should not happen. Since Flink 1.4.2 is no longer
actively supported, I would suggest to upgrade to the latest Flink version and to check whether the problem still occurs. Cheers, Till On Tue, Feb 26, 2019 at 2:48 AM Kumar Bolar, Harshith <[hidden email]> wrote: Hi all, We're running Flink on a standalone five node cluster. The /tmp/ directory keeps filling with directories starting with blobstore--*. These directories are very large (approx 1
GB) and fill up the space very quickly and the jobs fail with a No space left of device error. The files in these directories appear to be some form of binary representation of the jobs that are running on the cluster. What are these files and how do I take care of cleaning them so they don't fill up /tmp/ causing jobs to fail? Flink version: 1.4.2 Thanks,
Harshith |
Yes, at the moment this does not happen automatically. When deleting the directories you have to be careful not to delete the directory of a running TaskManager. Cheers, Till On Wed, Feb 27, 2019 at 6:29 PM Kumar Bolar, Harshith <[hidden email]> wrote:
|
Is there any way to figure out which one is being run on the TaskManager? Would it be safe to assume that it is the latest directory created? Regards, Harshith From: Till Rohrmann <[hidden email]> Yes, at the moment this does not happen automatically. When deleting the directories you have to be careful not to delete the directory of a running TaskManager.
Cheers, Till On Wed, Feb 27, 2019 at 6:29 PM Kumar Bolar, Harshith <[hidden email]> wrote:
|
Yes this is one way. Another way could be to look into the logs of the running TaskManagers. They should contain the path of the blob store directory. Cheers, Till On Thu, Feb 28, 2019 at 12:04 PM Kumar Bolar, Harshith <[hidden email]> wrote:
|
Thanks a lot. Looking into the logs sounds like a much cleaner approach :-) From: Till Rohrmann <[hidden email]> Yes this is one way. Another way could be to look into the logs of the running TaskManagers. They should contain the path of the blob store directory.
Cheers, Till On Thu, Feb 28, 2019 at 12:04 PM Kumar Bolar, Harshith <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |