(DEPRECATED) Apache Flink User Mailing List archive.

Iterations and back pressure problem

Classic

List

Threaded

3 messages Options

spoganshev

Iterations and back pressure problem

We've tried using iterations feature and in case of significant load the job sometimes stalls and stops processing events due to high back pressure both in tasks that produces records for iteration and all the other inputs to this task. It looks like a back pressure loop the task can't handle all the incoming records, iteration sink loops back into this task and also gets back pressured. This is basically a "back pressure loop" which causes a complete job stoppage.

Is there a way to mitigate this (to guarantee such issue does not occur)?

Andrey Zagrebin-2

Re: Iterations and back pressure problem

Hi Sergey,

It seems to be a known issue. Community will hopefully work on this but I do not see more updates since the last answer to the similar question [1], see also [2] and [3].

Best,

Andrey

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E

[2] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E

[3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132

On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <[hidden email]> wrote:

We've tried using iterations feature and in case of significant load the job sometimes stalls and stops processing events due to high back pressure both in tasks that produces records for iteration and all the other inputs to this task. It looks like a back pressure loop the task can't handle all the incoming records, iteration sink loops back into this task and also gets back pressured. This is basically a "back pressure loop" which causes a complete job stoppage.

Is there a way to mitigate this (to guarantee such issue does not occur)?

Ken Krugler

Re: Iterations and back pressure problem

Hi Sergey,

As Andrey noted, it’s a known issue with (currently) no good solution.

I talk a bit about how we worked around it on slide 26 of my Flink Forward talk on a Flink-based web crawler.

Basically we do some cheesy approximate monitoring of in-flight data, and throttle the key producer so that (hopefully) network buffers don’t fill up to the point of deadlock.

— Ken

On Dec 24, 2018, at 8:46 AM, Andrey Zagrebin <[hidden email]> wrote:

Hi Sergey,

It seems to be a known issue. Community will hopefully work on this but I do not see more updates since the last answer to the similar question [1], see also [2] and [3].

Best,
Andrey

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
[2] http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E
[3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132

On Mon, Dec 24, 2018 at 7:16 PM Sergei Poganshev <[hidden email]> wrote:
We've tried using iterations feature and in case of significant load the job sometimes stalls and stops processing events due to high back pressure both in tasks that produces records for iteration and all the other inputs to this task. It looks like a back pressure loop the task can't handle all the incoming records, iteration sink loops back into this task and also gets back pressured. This is basically a "back pressure loop" which causes a complete job stoppage.

Is there a way to mitigate this (to guarantee such issue does not occur)?

--------------------------

Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra