https://data-artisans.com/blog/how-flink-handles-backpressure and got the idea but I would like to know more details. Let's consider org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext interface and its void collect(T element);method. Is back pressure mechanism going to to block the calling collect method thread for some time? How does it compare what has been written in the mentioned article? I don't quite understand how 'The output side never puts too much data on the wire by a simple watermark mechanism' is supposed to work.
Pawel |
Hi Pawel, The data transfer process on sender side is in the following way: operator collect record --> serilize to flink buffer --> copy to netty buffer --> flush to socket On receiver side: socket --> netty --> flink buffer --> deserialize to record --> operator process On receiver side, if the operator processes slowly, the limit flink buffer will be exhausted, then the netty thread can not request flink buffer and switch off the channel read on netty side temporaraily as a result. This will cause the socket data accumulated on receiver side and back pressure the sender by tcp mechanism. On sender side, the socket will not send data to the receiver any more by tcp back pressure and is accumulated data gradually. We config the min and max watermark on netty side to limit in-flight data and netty buffers consumption. For example, if we define 2 flink buffers as max watermark in netty, then the netty thread can only copy 2 flink buffers at most until they are already flushed to the socket. If the socket space is full caused by tcp back pressure from the receiver, the netty thread will not consume flink buffer any more after reaching the max watermark as a result. After all the limit flink buffers are exhausted by collecting records, there are no available flink buffers any more, then the collect(T element) method you mentioned will be blocked by requesting flink buffer. All the whole processes seem a bit complicated and wish it can help you. BTW, from FLINK-1.5 release, the network flow control is changed to classic credit-based mechanism. That means the sender transfers buffers only based on receiver's announced available buffers and will not send extra data any more, so there are no in-flight data accumualted on the wire. Zhijiang
|
Free forum by Nabble | Edit this page |