Hi,
we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS as checkpoint and recovery storage dir. What we see is that blobStores are stored in HDFS as well as under the local Jobmanagers and Taskmanagers /tmp directory. Is this the expected behaviour? Is there any documentation on which blobs are stored locally and which are stored in HDFS in our case? In particular, we would need to know when it is save to delete blobs stored locally because there are not cleanup up by Flink and fill up the /tmp partition eventually. Cheers, Konstantin -- Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 signature.asc (836 bytes) Download Attachment |
On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
<[hidden email]> wrote: > we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS > as checkpoint and recovery storage dir. What we see is that blobStores > are stored in HDFS as well as under the local Jobmanagers and > Taskmanagers /tmp directory. > > Is this the expected behaviour? Is there any documentation on which > blobs are stored locally and which are stored in HDFS in our case? In > particular, we would need to know when it is save to delete blobs stored > locally because there are not cleanup up by Flink and fill up the /tmp > partition eventually. BLOBs are copied to another directory in case of HA in order to be available for other job managers that might take over. On regular termination (cancel, finish) all BLOBs should be cleaned up. With hard failures, it can happen that BLOBs are not cleaned up. Do you know in which cases you see BLOBs not being cleaned up? If it is the first one, that sounds like a bug to me. – Ufuk |
Hi Ufuk,
thanks for your quick answer. Setup: 2 Servers, each running a JM as well as TM 1) Removing all existing blobstores locally (/tmp) as well as on HDFS 2) Starting a flink streaming job Now there are the following BLOBs: Local: *Leader JM: 4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache *Standby JM: 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache HDFS: 66595700 2016-09-30 13:03 <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b 3) Cancelinng both jobs via command line: Now there are the following BLOBs: **same as above** When starting the same job again, no new blobs are created. Is it a problem to delete local blobStores of running jobs or will the blobs just be downloaded again from HDFS if needed? Cheers, Konstantin Is it correct, that ea On 30.09.2016 10:28, Ufuk Celebi wrote: > On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf > <[hidden email]> wrote: >> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS >> as checkpoint and recovery storage dir. What we see is that blobStores >> are stored in HDFS as well as under the local Jobmanagers and >> Taskmanagers /tmp directory. >> >> Is this the expected behaviour? Is there any documentation on which >> blobs are stored locally and which are stored in HDFS in our case? In >> particular, we would need to know when it is save to delete blobs stored >> locally because there are not cleanup up by Flink and fill up the /tmp >> partition eventually. > > BLOBs are copied to another directory in case of HA in order to be > available for other job managers that might take over. > > On regular termination (cancel, finish) all BLOBs should be cleaned > up. With hard failures, it can happen that BLOBs are not cleaned up. > > Do you know in which cases you see BLOBs not being cleaned up? If it > is the first one, that sounds like a bug to me. > > – Ufuk > Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 signature.asc (836 bytes) Download Attachment |
Hi Ufuk,
any ideas? Any configuration that could be wrong? Cheers, Konstantin On 30.09.2016 13:13, Konstantin Knauf wrote: > Hi Ufuk, > > thanks for your quick answer. > > Setup: 2 Servers, each running a JM as well as TM > > 1) Removing all existing blobstores locally (/tmp) as well as on HDFS > 2) Starting a flink streaming job > > Now there are the following BLOBs: > > Local: > > *Leader JM: > > 4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming > > 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4 > > 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache > > 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401 > > 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache > > *Standby JM: > > 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea > > 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache > > HDFS: > > 66595700 2016-09-30 13:03 > <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b > > > 3) Cancelinng both jobs via command line: > > Now there are the following BLOBs: > > **same as above** > > When starting the same job again, no new blobs are created. > > Is it a problem to delete local blobStores of running jobs or will the > blobs just be downloaded again from HDFS if needed? > > Cheers, > > Konstantin > > > > Is it correct, that ea > > On 30.09.2016 10:28, Ufuk Celebi wrote: >> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf >> <[hidden email]> wrote: >>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS >>> as checkpoint and recovery storage dir. What we see is that blobStores >>> are stored in HDFS as well as under the local Jobmanagers and >>> Taskmanagers /tmp directory. >>> >>> Is this the expected behaviour? Is there any documentation on which >>> blobs are stored locally and which are stored in HDFS in our case? In >>> particular, we would need to know when it is save to delete blobs stored >>> locally because there are not cleanup up by Flink and fill up the /tmp >>> partition eventually. >> >> BLOBs are copied to another directory in case of HA in order to be >> available for other job managers that might take over. >> >> On regular termination (cancel, finish) all BLOBs should be cleaned >> up. With hard failures, it can happen that BLOBs are not cleaned up. >> >> Do you know in which cases you see BLOBs not being cleaned up? If it >> is the first one, that sounds like a bug to me. >> >> – Ufuk >> > Konstantin Knauf * [hidden email] * +49-174-3413182 TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke Sitz: Unterföhring * Amtsgericht München * HRB 135082 signature.asc (836 bytes) Download Attachment |
Hi Konstantin,
This looks fine. Generally it is fine to delete Blobs in /tmp once the Job is running or has finished. When the job is running, the Flink classloader has already opened these files. Thus, the file system will still have these available through the file descriptor and defer deletion until the descriptor is closed (at least in Unix like systems). When the job is finished, the blobs will be cleaned after some time. In the latest master, we have changed the descriptors to immediately release file descriptors. In Flink 1.1.x we still hold on to them until the job history is cleared from the web interface. -Max On Tue, Oct 4, 2016 at 4:54 PM, Konstantin Knauf <[hidden email]> wrote: > Hi Ufuk, > > any ideas? Any configuration that could be wrong? > > Cheers, > > Konstantin > > On 30.09.2016 13:13, Konstantin Knauf wrote: >> Hi Ufuk, >> >> thanks for your quick answer. >> >> Setup: 2 Servers, each running a JM as well as TM >> >> 1) Removing all existing blobstores locally (/tmp) as well as on HDFS >> 2) Starting a flink streaming job >> >> Now there are the following BLOBs: >> >> Local: >> >> *Leader JM: >> >> 4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming >> >> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4 >> >> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache >> >> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401 >> >> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache >> >> *Standby JM: >> >> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea >> >> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache >> >> HDFS: >> >> 66595700 2016-09-30 13:03 >> <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b >> >> >> 3) Cancelinng both jobs via command line: >> >> Now there are the following BLOBs: >> >> **same as above** >> >> When starting the same job again, no new blobs are created. >> >> Is it a problem to delete local blobStores of running jobs or will the >> blobs just be downloaded again from HDFS if needed? >> >> Cheers, >> >> Konstantin >> >> >> >> Is it correct, that ea >> >> On 30.09.2016 10:28, Ufuk Celebi wrote: >>> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf >>> <[hidden email]> wrote: >>>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS >>>> as checkpoint and recovery storage dir. What we see is that blobStores >>>> are stored in HDFS as well as under the local Jobmanagers and >>>> Taskmanagers /tmp directory. >>>> >>>> Is this the expected behaviour? Is there any documentation on which >>>> blobs are stored locally and which are stored in HDFS in our case? In >>>> particular, we would need to know when it is save to delete blobs stored >>>> locally because there are not cleanup up by Flink and fill up the /tmp >>>> partition eventually. >>> >>> BLOBs are copied to another directory in case of HA in order to be >>> available for other job managers that might take over. >>> >>> On regular termination (cancel, finish) all BLOBs should be cleaned >>> up. With hard failures, it can happen that BLOBs are not cleaned up. >>> >>> Do you know in which cases you see BLOBs not being cleaned up? If it >>> is the first one, that sounds like a bug to me. >>> >>> – Ufuk >>> >> > > -- > Konstantin Knauf * [hidden email] * +49-174-3413182 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > |
Free forum by Nabble | Edit this page |