(DEPRECATED) Apache Flink User Mailing List archive.

Internal buffers

Classic

List

Threaded

3 messages Options

leon_mclare

Internal buffers

I have a question regarding how tuples are buffered between (possibly chained) subtasks.

Is it correct that there is a buffer for each vertex in the DAG of subtasks? Regardless of task slot sharing? If yes, then the primary optimization in this regard is operator chaining.

Furthermore, how do these buffers translate into overhead? Is there a send thread and a receive thread per buffer, similar to Apache Storm?

I could not find details concerning such buffers in the relevant subsection under Concepts.

Thanks in advance.

Ufuk Celebi

Re: Internal buffers

There is this in the Wiki:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Buffers for data exchange come from the network buffer pool (by
default 2048 * 32KB buffers). They are distributed to the running
tasks and each logical channel between tasks needs at least one
buffer.

Tasks produce buffers, which are either consumed by the
a) NettyConnectionManager who has a Thread pool for network
communication shared by all tasks exchanging remote data (no Thread
per buffer), or
b) the consuming task thread (local exchange).

Chained operators run in a single task and exchange records without
serialization.

On Wed, Jun 1, 2016 at 11:54 AM, <[hidden email]> wrote:

> I have a question regarding how tuples are buffered between (possibly
> chained) subtasks.
>
> Is it correct that there is a buffer for each vertex in the DAG of subtasks?
> Regardless of task slot sharing? If yes, then the primary optimization in
> this regard is operator chaining.
>
> Furthermore, how do these buffers translate into overhead? Is there a send
> thread and a receive thread per buffer, similar to Apache Storm?
>
> I could not find details concerning such buffers in the relevant subsection
> under Concepts.
>
> Thanks in advance.

leon_mclare

Re: Internal buffers

Dear Ufuk,

the wiki entry is exactly what i was looking for. I found it quite complicated to understand on a first attempt but i will dedicate some more time for it in the future.

Thanks.

Regards
Leon

1. Jun 2016 13:06 by [hidden email]:

There is this in the Wiki:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Buffers for data exchange come from the network buffer pool (by
default 2048 * 32KB buffers). They are distributed to the running
tasks and each logical channel between tasks needs at least one
buffer.

Tasks produce buffers, which are either consumed by the
a) NettyConnectionManager who has a Thread pool for network
communication shared by all tasks exchanging remote data (no Thread
per buffer), or
b) the consuming task thread (local exchange).

Chained operators run in a single task and exchange records without
serialization.

On Wed, Jun 1, 2016 at 11:54 AM, <[hidden email]> wrote:
I have a question regarding how tuples are buffered between (possibly
chained) subtasks.

Is it correct that there is a buffer for each vertex in the DAG of subtasks?
Regardless of task slot sharing? If yes, then the primary optimization in
this regard is operator chaining.

Furthermore, how do these buffers translate into overhead? Is there a send
thread and a receive thread per buffer, similar to Apache Storm?

I could not find details concerning such buffers in the relevant subsection
under Concepts.

Thanks in advance.