For an operator, the input stream is faster than its output stream, so its input buffer will block the previous operator's output thread that transfers the data to this operator. Right?
Do the Flink and the Spark both handle the backpressure by blocking the thread? So what's the difference between them? For the data source, it is continuously producing the data, what if its output thread is blocked? Would the buffer overflow? |
Hi, some time ago I found a problem with backpressure in Spark and prepared a simple test to check it and compare with Flink. https://github.com/rssdev10/spark-kafka-streaming+ https://mail-archives.apache.org/mod_mbox/spark-user/201607.mbox/%3CCA+AWphp=2VsLrgSTWFFknw_KMbq88fZhKfvugoe4YYByEt7a=w@...%3E 2016-09-02 15:07 GMT+03:00 jiecxy <[hidden email]>: For an operator, the input stream is faster than its output stream, so its |
That's true. The reason why it works in Flink is that a slow downstream operator will back pressure an upstream operator which will then slow down. The technical implementation of this relies on the fact that Flink uses a bounded pool of network buffers. A sending operator writes data to network buffers and they are free for reuse once the data was sent. If a downstream operator is slow in processing received network buffers then the upstream operator will block until more network buffers become available. Cheers, Aljoscha On Fri, 2 Sep 2016 at 21:57 rss rss <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |