(DEPRECATED) Apache Flink User Mailing List archive.

Blobstorage Locally and on HDFS

Classic

List

Threaded

5 messages Options

snntr

Blobstorage Locally and on HDFS

Hi,

we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
as checkpoint and recovery storage dir. What we see is that blobStores
are stored in HDFS as well as under the local Jobmanagers and
Taskmanagers /tmp directory.

Is this the expected behaviour? Is there any documentation on which
blobs are stored locally and which are stored in HDFS in our case? In
particular, we would need to know when it is save to delete blobs stored
locally because there are not cleanup up by Flink and fill up the /tmp
partition eventually.

Cheers,

Konstantin

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

signature.asc (836 bytes) Download Attachment

Ufuk Celebi

Re: Blobstorage Locally and on HDFS

On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
<[hidden email]> wrote:

> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
> as checkpoint and recovery storage dir. What we see is that blobStores
> are stored in HDFS as well as under the local Jobmanagers and
> Taskmanagers /tmp directory.
>
> Is this the expected behaviour? Is there any documentation on which
> blobs are stored locally and which are stored in HDFS in our case? In
> particular, we would need to know when it is save to delete blobs stored
> locally because there are not cleanup up by Flink and fill up the /tmp
> partition eventually.

BLOBs are copied to another directory in case of HA in order to be
available for other job managers that might take over.

On regular termination (cancel, finish) all BLOBs should be cleaned
up. With hard failures, it can happen that BLOBs are not cleaned up.

Do you know in which cases you see BLOBs not being cleaned up? If it
is the first one, that sounds like a bug to me.

– Ufuk

snntr

Re: Blobstorage Locally and on HDFS

Hi Ufuk,

thanks for your quick answer.

Setup: 2 Servers, each running a JM as well as TM

1) Removing all existing blobstores locally (/tmp) as well as on HDFS
2) Starting a flink streaming job

Now there are the following BLOBs:

Local:

*Leader JM:

4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming

64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4

64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache

64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401

64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache

*Standby JM:

64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea

64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache

HDFS:

66595700 2016-09-30 13:03
<..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b

3) Cancelinng both jobs via command line:

Now there are the following BLOBs:

**same as above**

When starting the same job again, no new blobs are created.

Is it a problem to delete local blobStores of running jobs or will the
blobs just be downloaded again from HDFS if needed?

Cheers,

Konstantin

Is it correct, that ea

On 30.09.2016 10:28, Ufuk Celebi wrote:

> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
> <[hidden email]> wrote:
>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>> as checkpoint and recovery storage dir. What we see is that blobStores
>> are stored in HDFS as well as under the local Jobmanagers and
>> Taskmanagers /tmp directory.
>>
>> Is this the expected behaviour? Is there any documentation on which
>> blobs are stored locally and which are stored in HDFS in our case? In
>> particular, we would need to know when it is save to delete blobs stored
>> locally because there are not cleanup up by Flink and fill up the /tmp
>> partition eventually.
>
> BLOBs are copied to another directory in case of HA in order to be
> available for other job managers that might take over.
>
> On regular termination (cancel, finish) all BLOBs should be cleaned
> up. With hard failures, it can happen that BLOBs are not cleaned up.
>
> Do you know in which cases you see BLOBs not being cleaned up? If it
> is the first one, that sounds like a bug to me.
>
> – Ufuk
>

--
Konstantin Knauf * [hidden email] * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082

signature.asc (836 bytes) Download Attachment

snntr

Re: Blobstorage Locally and on HDFS

Hi Ufuk,

any ideas? Any configuration that could be wrong?

Cheers,

Konstantin

On 30.09.2016 13:13, Konstantin Knauf wrote:

> Hi Ufuk,
>
> thanks for your quick answer.
>
> Setup: 2 Servers, each running a JM as well as TM
>
> 1) Removing all existing blobstores locally (/tmp) as well as on HDFS
> 2) Starting a flink streaming job
>
> Now there are the following BLOBs:
>
> Local:
>
> *Leader JM:
>
> 4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming
>
> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4
>
> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache
>
> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401
>
> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache
>
> *Standby JM:
>
> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea
>
> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache
>
> HDFS:
>
> 66595700 2016-09-30 13:03
> <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b
>
>
> 3) Cancelinng both jobs via command line:
>
> Now there are the following BLOBs:
>
> **same as above**
>
> When starting the same job again, no new blobs are created.
>
> Is it a problem to delete local blobStores of running jobs or will the
> blobs just be downloaded again from HDFS if needed?
>
> Cheers,
>
> Konstantin
>
>
>
> Is it correct, that ea
>
> On 30.09.2016 10:28, Ufuk Celebi wrote:
>> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
>> <[hidden email]> wrote:
>>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>>> as checkpoint and recovery storage dir. What we see is that blobStores
>>> are stored in HDFS as well as under the local Jobmanagers and
>>> Taskmanagers /tmp directory.
>>>
>>> Is this the expected behaviour? Is there any documentation on which
>>> blobs are stored locally and which are stored in HDFS in our case? In
>>> particular, we would need to know when it is save to delete blobs stored
>>> locally because there are not cleanup up by Flink and fill up the /tmp
>>> partition eventually.
>>
>> BLOBs are copied to another directory in case of HA in order to be
>> available for other job managers that might take over.
>>
>> On regular termination (cancel, finish) all BLOBs should be cleaned
>> up. With hard failures, it can happen that BLOBs are not cleaned up.
>>
>> Do you know in which cases you see BLOBs not being cleaned up? If it
>> is the first one, that sounds like a bug to me.
>>
>> – Ufuk
>>
>

signature.asc (836 bytes) Download Attachment

Maximilian Michels

Re: Blobstorage Locally and on HDFS

Hi Konstantin,

This looks fine. Generally it is fine to delete Blobs in /tmp once the
Job is running or has finished. When the job is running, the Flink
classloader has already opened these files. Thus, the file system will
still have these available through the file descriptor and defer
deletion until the descriptor is closed (at least in Unix like
systems). When the job is finished, the blobs will be cleaned after
some time.

In the latest master, we have changed the descriptors to immediately
release file descriptors. In Flink 1.1.x we still hold on to them
until the job history is cleared from the web interface.

-Max

On Tue, Oct 4, 2016 at 4:54 PM, Konstantin Knauf
<[hidden email]> wrote:

> Hi Ufuk,
>
> any ideas? Any configuration that could be wrong?
>
> Cheers,
>
> Konstantin
>
> On 30.09.2016 13:13, Konstantin Knauf wrote:
>> Hi Ufuk,
>>
>> thanks for your quick answer.
>>
>> Setup: 2 Servers, each running a JM as well as TM
>>
>> 1) Removing all existing blobstores locally (/tmp) as well as on HDFS
>> 2) Starting a flink streaming job
>>
>> Now there are the following BLOBs:
>>
>> Local:
>>
>> *Leader JM:
>>
>> 4.0K /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/incoming
>>
>> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4
>>
>> 64M /tmp/blobStore-563a8820-9617-4d89-97a7-fc3cc258dff4/cache
>>
>> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401
>>
>> 64M /tmp/blobStore-c6b93d41-8916-4a8d-b595-6e35f0b10401/cache
>>
>> *Standby JM:
>>
>> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea
>>
>> 64M /tmp/blobStore-4cbfd3c0-2a70-4485-8fc0-045ca7f08cea/cache
>>
>> HDFS:
>>
>> 66595700 2016-09-30 13:03
>> <..>/flink/blob/cache/blob_da76e12b949a83404f97b6eb59416deaa31a907b
>>
>>
>> 3) Cancelinng both jobs via command line:
>>
>> Now there are the following BLOBs:
>>
>> **same as above**
>>
>> When starting the same job again, no new blobs are created.
>>
>> Is it a problem to delete local blobStores of running jobs or will the
>> blobs just be downloaded again from HDFS if needed?
>>
>> Cheers,
>>
>> Konstantin
>>
>>
>>
>> Is it correct, that ea
>>
>> On 30.09.2016 10:28, Ufuk Celebi wrote:
>>> On Fri, Sep 30, 2016 at 9:12 AM, Konstantin Knauf
>>> <[hidden email]> wrote:
>>>> we are running a Flink (1.1.2) Stand-Alone Cluster with JM HA, and HDFS
>>>> as checkpoint and recovery storage dir. What we see is that blobStores
>>>> are stored in HDFS as well as under the local Jobmanagers and
>>>> Taskmanagers /tmp directory.
>>>>
>>>> Is this the expected behaviour? Is there any documentation on which
>>>> blobs are stored locally and which are stored in HDFS in our case? In
>>>> particular, we would need to know when it is save to delete blobs stored
>>>> locally because there are not cleanup up by Flink and fill up the /tmp
>>>> partition eventually.
>>>
>>> BLOBs are copied to another directory in case of HA in order to be
>>> available for other job managers that might take over.
>>>
>>> On regular termination (cancel, finish) all BLOBs should be cleaned
>>> up. With hard failures, it can happen that BLOBs are not cleaned up.
>>>
>>> Do you know in which cases you see BLOBs not being cleaned up? If it
>>> is the first one, that sounds like a bug to me.
>>>
>>> – Ufuk
>>>
>>
>
> --
> Konstantin Knauf * [hidden email] * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>