maximum size of window

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

maximum size of window

vishnuviswanath
Hi All,

- Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
- What happens if a window size grows more than size of a node, will it be split into multiple nodes?

if I am going to have a huge window, should I have fewer nodes with more memory. 
Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.

Regards,
Vishnu
Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

Kostas Kloudas
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu

Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

vishnuviswanath
Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu


Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

Kostas Kloudas
Hi Vishnu,

RocksDB allows for storing the window contents on disk when the state of a window becomes too big.
BUT when you have to trigger and apply the computation of your window function on that big window, 
then all of its state is loaded in memory.

So although during the window formation phase, RocksDB allows you to not  worry about storage space,
when it is time to fire your computation, then you have to consider how much RAM you have and if the 
window fits in it.

Regards,
Kostas
 
On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <[hidden email]> wrote:

Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu



Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

Aljoscha Krettek
Hi,
one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 10:36 Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

RocksDB allows for storing the window contents on disk when the state of a window becomes too big.
BUT when you have to trigger and apply the computation of your window function on that big window, 
then all of its state is loaded in memory.

So although during the window formation phase, RocksDB allows you to not  worry about storage space,
when it is time to fire your computation, then you have to consider how much RAM you have and if the 
window fits in it.

Regards,
Kostas
 
On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <[hidden email]> wrote:

Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu



Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

vishnuviswanath
Hi,

Thank you for the responses.
I am not sure if I will be able to use Fold/Reduce function, but I will keep that in mind.

I have one more question, so what is the implication of having a key that splits the data into window of very small size(=> large number of small windows) ?

Thanks and Regards,
Vishnu Viswanath,

On Tue, Jun 28, 2016 at 5:04 AM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 10:36 Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

RocksDB allows for storing the window contents on disk when the state of a window becomes too big.
BUT when you have to trigger and apply the computation of your window function on that big window, 
then all of its state is loaded in memory.

So although during the window formation phase, RocksDB allows you to not  worry about storage space,
when it is time to fire your computation, then you have to consider how much RAM you have and if the 
window fits in it.

Regards,
Kostas
 
On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <[hidden email]> wrote:

Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu





Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

Aljoscha Krettek
Hi,
the result of splitting by key is that processing can easily be distributed among the workers because the windows for individual keys can be processed independently. This should improve cluster utilization.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 17:26 Vishnu Viswanath <[hidden email]> wrote:
Hi,

Thank you for the responses.
I am not sure if I will be able to use Fold/Reduce function, but I will keep that in mind.

I have one more question, so what is the implication of having a key that splits the data into window of very small size(=> large number of small windows) ?

Thanks and Regards,
Vishnu Viswanath,

On Tue, Jun 28, 2016 at 5:04 AM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 10:36 Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

RocksDB allows for storing the window contents on disk when the state of a window becomes too big.
BUT when you have to trigger and apply the computation of your window function on that big window, 
then all of its state is loaded in memory.

So although during the window formation phase, RocksDB allows you to not  worry about storage space,
when it is time to fire your computation, then you have to consider how much RAM you have and if the 
window fits in it.

Regards,
Kostas
 
On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <[hidden email]> wrote:

Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <[hidden email]> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <[hidden email]> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu





Reply | Threaded
Open this post in threaded view
|

Re: maximum size of window

vishnuviswanath
Thank you!

On Wednesday, 29 June 2016, Aljoscha Krettek <[hidden email]> wrote:
Hi,
the result of splitting by key is that processing can easily be distributed among the workers because the windows for individual keys can be processed independently. This should improve cluster utilization.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 17:26 Vishnu Viswanath <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;vishnu.viswanath25@gmail.com&#39;);" target="_blank">vishnu.viswanath25@...> wrote:
Hi,

Thank you for the responses.
I am not sure if I will be able to use Fold/Reduce function, but I will keep that in mind.

I have one more question, so what is the implication of having a key that splits the data into window of very small size(=> large number of small windows) ?

Thanks and Regards,
Vishnu Viswanath,

On Tue, Jun 28, 2016 at 5:04 AM, Aljoscha Krettek <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;aljoscha@apache.org&#39;);" target="_blank">aljoscha@...> wrote:
Hi,
one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys.

Cheers,
Aljoscha

On Tue, 28 Jun 2016 at 10:36 Kostas Kloudas <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;k.kloudas@data-artisans.com&#39;);" target="_blank">k.kloudas@...> wrote:
Hi Vishnu,

RocksDB allows for storing the window contents on disk when the state of a window becomes too big.
BUT when you have to trigger and apply the computation of your window function on that big window, 
then all of its state is loaded in memory.

So although during the window formation phase, RocksDB allows you to not  worry about storage space,
when it is time to fire your computation, then you have to consider how much RAM you have and if the 
window fits in it.

Regards,
Kostas
 
On Jun 28, 2016, at 1:25 AM, Vishnu Viswanath <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;vishnu.viswanath25@gmail.com&#39;);" target="_blank">vishnu.viswanath25@...> wrote:

Hi Kostas,

Thank you. 
Yes 2) was exactly what I wanted to know.

- So if I am using RocksDB as state backend, does that mean that I don't have to worry much about the memory available per node since RocksDB will use RAM and Disk to store the window state? 

Regards,
Vishnu


On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;k.kloudas@data-artisans.com&#39;);" target="_blank">k.kloudas@...> wrote:
Hi Vishnu,

I hope the following will help answer your question:

1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed.
2) A window belonging to a certain key is handled by a single node. In other words, no matter how big the window is, its
        state (the elements it contains) will never be split between two or more nodes.
3) Where the state is stored, depends on your state backend. Currently Flink supports an in-memory one, a filesystem one, and
        a rocksDB one which is in the middle (first in-memory and then disk when needed). Of course you can implement your own.

From the above, you can see that if you use the memory-backed state backend, then your window size is limited by the memory
available at each of your nodes. If you use the fs state backend, then your state is stored on disk. Finally, rocksDB will initially
use RAM and then spill on disk when no more memory is available.

Here I have to add that the window documentation is currently being re-written to explain new features introduced in Flink 1.1,
which include more flexible handling of late events and more explicit state garbage collection.

So please stay tuned!

I hope this helps at answering your question,
Kostas

> On Jun 27, 2016, at 8:22 PM, Vishnu Viswanath <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;vishnu.viswanath25@gmail.com&#39;);" target="_blank">vishnu.viswanath25@...> wrote:
>
> Hi All,
>
> - Is there any restriction on the size of a window in Flink with respect to the memory of the nodes?
> - What happens if a window size grows more than size of a node, will it be split into multiple nodes?
>
> if I am going to have a huge window, should I have fewer nodes with more memory.
> Is there any documentation on how memory is managed/handled in the case of windows and also in the case of joins.
>
> Regards,
> Vishnu







--
Thanks and Regards,
Vishnu Viswanath,