flink memory management / temp-io dir question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

flink memory management / temp-io dir question

anand.gopinath

Hi ,

I had a question with respect flink memory management / overspill to /tmp.

 

In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....

 

In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator

 

e.g.

bash-4.2$ du -sh *

0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26

352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e

4.0K    localState

7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3

 

bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*

...

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e

...

 

I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?

Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?

 

thanks

Anand



Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.

E-mails are not encrypted and cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The sender
therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission.
If verification is required please request a hard-copy version. This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities
or related financial instruments.

UBS Limited is a company limited by shares incorporated in the United
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority
and regulated by the Financial Conduct Authority and the Prudential
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment:
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is
authorised by the Prudential Regulation Authority and subject to
regulation by the Financial Conduct Authority and limited regulation
by the Prudential Regulation Authority.  Details about the extent
of our regulation by the Prudential Regulation Authority are
available from us on request.

UBS Business Solutions AG is a public company incorporated with
limited liability in Switzerland domiciled in the Canton of Zurich
registered at the Commercial Registry office with Identification
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45,
8001 Zurich, Switzerland.  Registered in the United Kingdom as a
foreign company with No: FC034139 and having a UK Establishment
registered at Companies House, Cardiff, with No: BR019277.  The
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.
Reply | Threaded
Open this post in threaded view
|

Re: flink memory management / temp-io dir question

Kostas Kloudas
Hi Anand,

I think that Till is the best person to answer your question.

Cheers,
Kostas

On Oct 5, 2018, at 3:44 PM, [hidden email] wrote:

Hi , 
I had a question with respect flink memory management / overspill to /tmp.
 
In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....
 
In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator 
 
e.g.
bash-4.2$ du -sh *
0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26
352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e
4.0K    localState
7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3
 
bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*
...
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e
...
 
I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?
Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?
 
thanks 
Anand

Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is 
authorised by the Prudential Regulation Authority and subject to 
regulation by the Financial Conduct Authority and limited regulation 
by the Prudential Regulation Authority.  Details about the extent 
of our regulation by the Prudential Regulation Authority are 
available from us on request.

UBS Business Solutions AG is a public company incorporated with 
limited liability in Switzerland domiciled in the Canton of Zurich 
registered at the Commercial Registry office with Identification 
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45, 
8001 Zurich, Switzerland.  Registered in the United Kingdom as a 
foreign company with No: FC034139 and having a UK Establishment 
registered at Companies House, Cardiff, with No: BR019277.  The 
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

Reply | Threaded
Open this post in threaded view
|

Re: flink memory management / temp-io dir question

Kostas Kloudas
Sorry, I forgot to cc’ Till.

On Oct 8, 2018, at 2:17 PM, Kostas Kloudas <[hidden email]> wrote:

Hi Anand,

I think that Till is the best person to answer your question.

Cheers,
Kostas

On Oct 5, 2018, at 3:44 PM, [hidden email] wrote:

Hi , 
I had a question with respect flink memory management / overspill to /tmp.
 
In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....
 
In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator 
 
e.g.
bash-4.2$ du -sh *
0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26
352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e
4.0K    localState
7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3
 
bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*
...
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e
...
 
I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?
Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?
 
thanks 
Anand

Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is 
authorised by the Prudential Regulation Authority and subject to 
regulation by the Financial Conduct Authority and limited regulation 
by the Prudential Regulation Authority.  Details about the extent 
of our regulation by the Prudential Regulation Authority are 
available from us on request.

UBS Business Solutions AG is a public company incorporated with 
limited liability in Switzerland domiciled in the Canton of Zurich 
registered at the Commercial Registry office with Identification 
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45, 
8001 Zurich, Switzerland.  Registered in the United Kingdom as a 
foreign company with No: FC034139 and having a UK Establishment 
registered at Companies House, Cardiff, with No: BR019277.  The 
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 


Reply | Threaded
Open this post in threaded view
|

Re: flink memory management / temp-io dir question

Till Rohrmann
Hi Anand,

spilling using the io directories is only relevant for Flink's batch processing. This happens, for example if you enable blocking data exchange where the produced data cannot be kept in memory. Moreover, it is used by many of Flink's out-of-core data structures to enable exactly this feature (e.g. users are the MutableHashTable, the MergeIterator to combine sorted ata which has been spilled or the SorterMerger to actually spill data).

In streaming Flink uses the RocksDB state backend to spill very large state gracefully to disk. Thus, you would need to configure RocksDB in order to control the spilling behaviour.

Cheers,
Till

On Mon, Oct 8, 2018 at 2:18 PM Kostas Kloudas <[hidden email]> wrote:
Sorry, I forgot to cc’ Till.

On Oct 8, 2018, at 2:17 PM, Kostas Kloudas <[hidden email]> wrote:

Hi Anand,

I think that Till is the best person to answer your question.

Cheers,
Kostas

On Oct 5, 2018, at 3:44 PM, [hidden email] wrote:

Hi , 
I had a question with respect flink memory management / overspill to /tmp.
 
In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....
 
In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator 
 
e.g.
bash-4.2$ du -sh *
0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26
352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e
4.0K    localState
7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3
 
bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*
...
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1
36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e
...
 
I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?
Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?
 
thanks 
Anand

Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is 
authorised by the Prudential Regulation Authority and subject to 
regulation by the Financial Conduct Authority and limited regulation 
by the Prudential Regulation Authority.  Details about the extent 
of our regulation by the Prudential Regulation Authority are 
available from us on request.

UBS Business Solutions AG is a public company incorporated with 
limited liability in Switzerland domiciled in the Canton of Zurich 
registered at the Commercial Registry office with Identification 
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45, 
8001 Zurich, Switzerland.  Registered in the United Kingdom as a 
foreign company with No: FC034139 and having a UK Establishment 
registered at Companies House, Cardiff, with No: BR019277.  The 
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 


Reply | Threaded
Open this post in threaded view
|

RE: flink memory management / temp-io dir question

anand.gopinath

Hi Till,

Thanks for the reply.

I don’t use batch, so I assume  what I am seeing is streaming related. I thought rocksdb writes to a different dir though ( as defined by  checkpoint.data.uri)?

Regards,

Anand

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: 08 October 2018 13:39
To: Kostas Kloudas
Cc: Gopinath, Anand; user; Till Rohrmann
Subject: Re: flink memory management / temp-io dir question

 

Hi Anand,

 

spilling using the io directories is only relevant for Flink's batch processing. This happens, for example if you enable blocking data exchange where the produced data cannot be kept in memory. Moreover, it is used by many of Flink's out-of-core data structures to enable exactly this feature (e.g. users are the MutableHashTable, the MergeIterator to combine sorted ata which has been spilled or the SorterMerger to actually spill data).

 

In streaming Flink uses the RocksDB state backend to spill very large state gracefully to disk. Thus, you would need to configure RocksDB in order to control the spilling behaviour.

 

Cheers,

Till

 

On Mon, Oct 8, 2018 at 2:18 PM Kostas Kloudas <[hidden email]> wrote:

Sorry, I forgot to cc’ Till.



On Oct 8, 2018, at 2:17 PM, Kostas Kloudas <[hidden email]> wrote:

 

Hi Anand,

 

I think that Till is the best person to answer your question.

 

Cheers,

Kostas



On Oct 5, 2018, at 3:44 PM, [hidden email] wrote:

 

Hi , 

I had a question with respect flink memory management / overspill to /tmp.

 

In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....

 

In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator 

 

e.g.

bash-4.2$ du -sh *

0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26

352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e

4.0K    localState

7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3

 

bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*

...

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e

...

 

I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?

Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?

 

thanks 

Anand


Visit our website at 
http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is 
authorised by the Prudential Regulation Authority and subject to 
regulation by the Financial Conduct Authority and limited regulation 
by the Prudential Regulation Authority.  Details about the extent 
of our regulation by the Prudential Regulation Authority are 
available from us on request.

UBS Business Solutions AG is a public company incorporated with 
limited liability in Switzerland domiciled in the Canton of Zurich 
registered at the Commercial Registry office with Identification 
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45, 
8001 Zurich, Switzerland.  Registered in the United Kingdom as a 
foreign company with No: FC034139 and having a UK Establishment 
registered at Companies House, Cardiff, with No: BR019277.  The 
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

 

 



Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.

E-mails are not encrypted and cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The sender
therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission.
If verification is required please request a hard-copy version. This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities
or related financial instruments.

UBS Limited is a company limited by shares incorporated in the United
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority
and regulated by the Financial Conduct Authority and the Prudential
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment:
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is
authorised by the Prudential Regulation Authority and subject to
regulation by the Financial Conduct Authority and limited regulation
by the Prudential Regulation Authority.  Details about the extent
of our regulation by the Prudential Regulation Authority are
available from us on request.

UBS Business Solutions AG is a public company incorporated with
limited liability in Switzerland domiciled in the Canton of Zurich
registered at the Commercial Registry office with Identification
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45,
8001 Zurich, Switzerland.  Registered in the United Kingdom as a
foreign company with No: FC034139 and having a UK Establishment
registered at Companies House, Cardiff, with No: BR019277.  The
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.
Reply | Threaded
Open this post in threaded view
|

RE: flink memory management / temp-io dir question

anand.gopinath
In reply to this post by Till Rohrmann

Its ok – I see the relevant docs now…

 

"The RocksDBStateBackend holds in-flight data in a RocksDB database that is (per default) stored in the TaskManager data directories."

 

Thanks for your help

Anand

 

From: Gopinath, Anand
Sent: 11 October 2018 18:40
To: 'Till Rohrmann'; Kostas Kloudas
Cc: user; Till Rohrmann
Subject: RE: flink memory management / temp-io dir question

 

Hi Till,

Thanks for the reply.

I don’t use batch, so I assume  what I am seeing is streaming related. I thought rocksdb writes to a different dir though ( as defined by  checkpoint.data.uri)?

Regards,

Anand

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: 08 October 2018 13:39
To: Kostas Kloudas
Cc: Gopinath, Anand; user; Till Rohrmann
Subject: Re: flink memory management / temp-io dir question

 

Hi Anand,

 

spilling using the io directories is only relevant for Flink's batch processing. This happens, for example if you enable blocking data exchange where the produced data cannot be kept in memory. Moreover, it is used by many of Flink's out-of-core data structures to enable exactly this feature (e.g. users are the MutableHashTable, the MergeIterator to combine sorted ata which has been spilled or the SorterMerger to actually spill data).

 

In streaming Flink uses the RocksDB state backend to spill very large state gracefully to disk. Thus, you would need to configure RocksDB in order to control the spilling behaviour.

 

Cheers,

Till

 

On Mon, Oct 8, 2018 at 2:18 PM Kostas Kloudas <[hidden email]> wrote:

Sorry, I forgot to cc’ Till.

 

On Oct 8, 2018, at 2:17 PM, Kostas Kloudas <[hidden email]> wrote:

 

Hi Anand,

 

I think that Till is the best person to answer your question.

 

Cheers,

Kostas

 

On Oct 5, 2018, at 3:44 PM, [hidden email] wrote:

 

Hi , 

I had a question with respect flink memory management / overspill to /tmp.

 

In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is not uncommon that more data needs to be processed than memory is available. Flink’s runtime is designed to write temporary data to disk to handle these situations....

 

In a  flink job  that processes a couple streams of 1M events in a  windowed co group function with parallelism 8 - we see 8 dirs created in /tmp with 100s of Meg of data, the name of each dir seems aligned to the data for each parallel thread windowing against the co-group  operator 

 

e.g.

bash-4.2$ du -sh *

0       flink-dist-cache-a4a69215-665a-4c3c-8d90-416cbe192f26

352M    flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e

4.0K    localState

7.2M    rocksdb-lib-03d9460b15e6bf6af4f3d9b0ff7980c3

 

bash-4.2$ du -sh flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/*

...

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__1_8__uuid_93307150-4f62-4b06-a71e-0230360f7d86

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__2_8__uuid_7b2f8957-7044-4bb3-869e-28843bd737a1

36M     flink-io-9033517c-ac92-4baa-9e59-79bc80c72a9e/job_cf2dca7843dd6b6296aa1a9d15a1d435_op_WindowOperator_014556c228cb5344d41861769d2bbbc1__3_8__uuid_54306a44-7e06-45ae-ba0e-4649887bca7e

...

 

I was wondering can / should this 'over spill' be avoided by increasing the heap of the task manager or another config or should I not worry about it?

Is there more information/docs on how this data is used/ cleaned up & what is the cost of this overspill to latency/ checkpointing? Any impact I should be aware of?

 

thanks 

Anand


Visit our website at 
http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority 
and regulated by the Financial Conduct Authority and the Prudential 
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment: 
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is 
authorised by the Prudential Regulation Authority and subject to 
regulation by the Financial Conduct Authority and limited regulation 
by the Prudential Regulation Authority.  Details about the extent 
of our regulation by the Prudential Regulation Authority are 
available from us on request.

UBS Business Solutions AG is a public company incorporated with 
limited liability in Switzerland domiciled in the Canton of Zurich 
registered at the Commercial Registry office with Identification 
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45, 
8001 Zurich, Switzerland.  Registered in the United Kingdom as a 
foreign company with No: FC034139 and having a UK Establishment 
registered at Companies House, Cardiff, with No: BR019277.  The 
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

 

 



Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only
for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.

E-mails are not encrypted and cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The sender
therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission.
If verification is required please request a hard-copy version. This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities
or related financial instruments.

UBS Limited is a company limited by shares incorporated in the United
Kingdom registered in England and Wales with number 2035362.  
Registered Office: 5 Broadgate, London EC2M 2QS
UBS Limited is authorised by the Prudential Regulation Authority
and regulated by the Financial Conduct Authority and the Prudential
Regulation Authority.

UBS AG is a public company incorporated with limited liability in
Switzerland domiciled in the Canton of Basel-City and the Canton of
Zurich respectively registered at the Commercial Registry offices in
those Cantons with new Identification No: CHE-101.329.561 as from 18
December 2013 (and prior to 18 December 2013 with Identification
No: CH-270.3.004.646-4) and having respective head offices at
Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich,
Switzerland and is authorised and regulated by the Financial Market
Supervisory Authority in Switzerland.  Registered in the United
Kingdom as a foreign company with No: FC021146 and having a UK
Establishment registered at Companies House, Cardiff, with
No: BR 004507.  The principal office of UK Establishment:
5 Broadgate, London EC2M 2QS. In the United Kingdom, UBS AG is
authorised by the Prudential Regulation Authority and subject to
regulation by the Financial Conduct Authority and limited regulation
by the Prudential Regulation Authority.  Details about the extent
of our regulation by the Prudential Regulation Authority are
available from us on request.

UBS Business Solutions AG is a public company incorporated with
limited liability in Switzerland domiciled in the Canton of Zurich
registered at the Commercial Registry office with Identification
No: CHE-262.289.477 and having its head office at Bahnhofstrasse 45,
8001 Zurich, Switzerland.  Registered in the United Kingdom as a
foreign company with No: FC034139 and having a UK Establishment
registered at Companies House, Cardiff, with No: BR019277.  The
principal office of UK Establishment: 5 Broadgate London EC2M 2QS.  

UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.