Internal buffers

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Internal buffers

leon_mclare
I have a question regarding how tuples are buffered between (possibly chained) subtasks.

Is it correct that there is a buffer for each vertex in the DAG of subtasks? Regardless of task slot sharing? If yes, then the primary optimization in this regard is operator chaining.

Furthermore, how do these buffers translate into overhead? Is there a send thread and a receive thread per buffer, similar to Apache Storm?

I could not find details concerning such buffers in the relevant subsection under Concepts.

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Internal buffers

Ufuk Celebi
There is this in the Wiki:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Buffers for data exchange come from the network buffer pool (by
default 2048 * 32KB buffers). They are distributed to the running
tasks and each logical channel between tasks needs at least one
buffer.

Tasks produce buffers, which are either consumed by the
a) NettyConnectionManager who has a Thread pool for network
communication shared by all tasks exchanging remote data (no Thread
per buffer), or
b) the consuming task thread (local exchange).

Chained operators run in a single task and exchange records without
serialization.

On Wed, Jun 1, 2016 at 11:54 AM,  <[hidden email]> wrote:

> I have a question regarding how tuples are buffered between (possibly
> chained) subtasks.
>
> Is it correct that there is a buffer for each vertex in the DAG of subtasks?
> Regardless of task slot sharing? If yes, then the primary optimization in
> this regard is operator chaining.
>
> Furthermore, how do these buffers translate into overhead? Is there a send
> thread and a receive thread per buffer, similar to Apache Storm?
>
> I could not find details concerning such buffers in the relevant subsection
> under Concepts.
>
> Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Internal buffers

leon_mclare
Dear Ufuk,

the wiki entry is exactly what i was looking for. I found it quite complicated to understand on a first attempt but i will dedicate some more time for it in the future.

Thanks.

Regards
Leon

1. Jun 2016 13:06 by [hidden email]:

There is this in the Wiki:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Buffers for data exchange come from the network buffer pool (by
default 2048 * 32KB buffers). They are distributed to the running
tasks and each logical channel between tasks needs at least one
buffer.

Tasks produce buffers, which are either consumed by the
a) NettyConnectionManager who has a Thread pool for network
communication shared by all tasks exchanging remote data (no Thread
per buffer), or
b) the consuming task thread (local exchange).

Chained operators run in a single task and exchange records without
serialization.

On Wed, Jun 1, 2016 at 11:54 AM, <[hidden email]> wrote:
I have a question regarding how tuples are buffered between (possibly
chained) subtasks.

Is it correct that there is a buffer for each vertex in the DAG of subtasks?
Regardless of task slot sharing? If yes, then the primary optimization in
this regard is operator chaining.

Furthermore, how do these buffers translate into overhead? Is there a send
thread and a receive thread per buffer, similar to Apache Storm?

I could not find details concerning such buffers in the relevant subsection
under Concepts.

Thanks in advance.