Apache Flink: How does it handle the backpressure?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink: How does it handle the backpressure?

jiecxy
For an operator, the input stream is faster than its output stream, so its input buffer will block the previous operator's output thread that transfers the data to this operator. Right?

Do the Flink and the Spark both handle the backpressure by blocking the thread? So what's the difference between them?

For the data source, it is continuously producing the data, what if its output thread is blocked? Would the buffer overflow?
Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink: How does it handle the backpressure?

rss rss
Hi,

  some time ago I found a problem with backpressure in Spark and prepared a simple test to check it and compare with Flink. https://github.com/rssdev10/spark-kafka-streaming


+ https://mail-archives.apache.org/mod_mbox/spark-user/201607.mbox/%3CCA+AWphp=2VsLrgSTWFFknw_KMbq88fZhKfvugoe4YYByEt7a=w@...%3E

In case of Flink it works. In case of Spark it works if you setup limitations of input rates per data sources. See source code an example. And actually backpressure detector in Spark works very bad.

Best regards

2016-09-02 15:07 GMT+03:00 jiecxy <[hidden email]>:
For an operator, the input stream is faster than its output stream, so its
input buffer will block the previous operator's output thread that transfers
the data to this operator. Right?

Do the Flink and the Spark both handle the backpressure by blocking the
thread? So what's the difference between them?

For the data source, it is continuously producing the data, what if its
output thread is blocked? Would the buffer overflow?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Apache-Flink-How-does-it-handle-the-backpressure-tp8866.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink: How does it handle the backpressure?

Aljoscha Krettek
That's true. The reason why it works in Flink is that a slow downstream operator will back pressure an upstream operator which will then slow down. The technical implementation of this relies on the fact that Flink uses a bounded pool of network buffers. A sending operator writes data to network buffers and they are free for reuse once the data was sent. If a downstream operator is slow in processing received network buffers then the upstream operator will block until more network buffers become available.

Cheers,
Aljoscha

On Fri, 2 Sep 2016 at 21:57 rss rss <[hidden email]> wrote:
Hi,

  some time ago I found a problem with backpressure in Spark and prepared a simple test to check it and compare with Flink. https://github.com/rssdev10/spark-kafka-streaming


+ https://mail-archives.apache.org/mod_mbox/spark-user/201607.mbox/%3CCA+AWphp=2VsLrgSTWFFknw_KMbq88fZhKfvugoe4YYByEt7a=w@...%3E

In case of Flink it works. In case of Spark it works if you setup limitations of input rates per data sources. See source code an example. And actually backpressure detector in Spark works very bad.

Best regards

2016-09-02 15:07 GMT+03:00 jiecxy <[hidden email]>:
For an operator, the input stream is faster than its output stream, so its
input buffer will block the previous operator's output thread that transfers
the data to this operator. Right?

Do the Flink and the Spark both handle the backpressure by blocking the
thread? So what's the difference between them?

For the data source, it is continuously producing the data, what if its
output thread is blocked? Would the buffer overflow?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Apache-Flink-How-does-it-handle-the-backpressure-tp8866.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.