This post was updated on .
Hello Flinkers!
I know that one should set appropriately the number of Network Buffers (NB) that its Flink deployment will use. Except from that, I am wondering if one might change/manipulate the specific sequence of data records into the NB in order to optimize the performance of its application. For instance, lets assume that a NB has now 3 elements {a,b,c} in this specific order. The data is going be shipped to a taskmanager(s) for further processing etc etc. But maybe if the aforementioned elements where to be shipped in another order, e.g. {b,c,a} then a specific task would run faster. Is there any such way to manipulate the ordering in the NB or the ordering of the arrival of tuples at the input of an operator??? Thanks in advance. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Max, the network buffer order is quite important at the moment, because the network stream does not only transport data but also control events such as the checkpoint barriers. In order to guarantee that you don't lose data in case of a failure it is (at the moment) strictly necessary that checkpoint barriers don't overtake data records for example. Moreover, records might span multiple memory buffers if they are large. Therefore, it might not be all that useful to do this ordering on the network buffer level. Instead, what you can always do is to sort elements in your user function. The price you have to pay for this is that you have to buffer elements in between and also checkpoint them. Cheers, Till On Thu, Feb 15, 2018 at 3:13 PM, m@xi <[hidden email]> wrote: Hello Flinker! |
Hi Till!
Thanks a lot for your useful reply. So now I get it. I should not manipulate or disturb the network buffer contents, as this will trigger other problematic behaviours. On the other hand, the price of buffering the data in my operator first and e.g. sorting them first based on some criterion, and then processing them...what is the its impact to the efficiency/effectiveness of a streaming algorithm. I mean, Flink is "pure" streaming, but not-so-pure due to the network buffers, so if I use another buffering at site in each operator, this will make my application slower and also this is not streaming, this becomes batch. Thanks in advance. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |