Manipulating Processing elements of Network Buffers

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Manipulating Processing elements of Network Buffers

m@xi
This post was updated on .
Hello Flinkers!

I know that one should set appropriately the number of Network Buffers (NB)
that its Flink deployment will use. Except from that, I am wondering if one
might change/manipulate the specific sequence of data records into the NB in
order to optimize the performance of its application.

For instance, lets assume that a NB has now 3 elements {a,b,c} in this
specific order. The data is going be shipped to a taskmanager(s) for further
processing etc etc. But maybe if the aforementioned elements where to be
shipped in another order, e.g. {b,c,a} then a specific task would run
faster.

Is there any such way to manipulate the ordering in the NB or the ordering
of the arrival of tuples at the input of an operator???

Thanks in advance.

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Manipulating Processing elements of Network Buffers

Till Rohrmann
Hi Max,

the network buffer order is quite important at the moment, because the network stream does not only transport data but also control events such as the checkpoint barriers. In order to guarantee that you don't lose data in case of a failure it is (at the moment) strictly necessary that checkpoint barriers don't overtake data records for example. Moreover, records might span multiple memory buffers if they are large. Therefore, it might not be all that useful to do this ordering on the network buffer level.

Instead, what you can always do is to sort elements in your user function. The price you have to pay for this is that you have to buffer elements in between and also checkpoint them.

Cheers,
Till

On Thu, Feb 15, 2018 at 3:13 PM, m@xi <[hidden email]> wrote:
Hello Flinker!

I know that one should set appropriately the number of Network Buffers (NB)
that its Flink deployment will use. Except from that, I am wondering if one
might change/manipulate the specific sequence of data records into the NB in
order to optimize the performance of its application.

For instance, lets assume that a NB has now 3 elements {a,b,c} in this
specific order. The data is going be shipped to a taskmanager(s) for further
processing etc etc. But maybe if the aforementioned elements where to be
shipped in another order, e.g. {b,c,a} then a specific task would run
faster.

Is there any such way to manipulate the ordering in the NB or the ordering
of the arrival of tuples at the input of an operator???

Thanks in advance.

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Manipulating Processing elements of Network Buffers

m@xi
Hi Till!

Thanks a lot for your useful reply.

So now I get it. I should not manipulate or disturb the network buffer
contents, as this will trigger other problematic behaviours. On the other
hand, the price of buffering the data in my operator first and e.g. sorting
them first based on some criterion, and then processing them...what is the
its impact to the efficiency/effectiveness of a streaming algorithm.

I mean, Flink is "pure" streaming, but not-so-pure due to the network
buffers, so if I use another buffering at site in each operator, this will
make my application slower and also this is not streaming, this becomes
batch.

Thanks in advance.

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/