Avoiding deadlock with iterations

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding deadlock with iterations

Ken Krugler
Hi all,

We’ve run into deadlocks with two different streaming workflows that have iterations.

In both cases, the issue is with fan-out; if any operation in the loop can emit more records than consumed, eventually a network buffer fills up, and then everyone in the iteration loop is blocked.

One pattern we can use, when the operator that’s causing the fan-out has the ability to decide how much to emit, is to have it behave as an async function, emitting from a queue with multiple threads. If threads start blocking because of back pressure, then the queue begins to fill up, and the function can throttle back how much data it queues up. So this gives us a small (carefully managed) data reservoir we can use to avoid the deadlock.

Is there a better approach? I didn’t see any way to determine how “full” the various network buffers are, and use that for throttling. Plus there’s the issue of partitioning, where it would be impossible in many cases to know the impact of a record being emitted. So even if we could monitor buffers, I don’t think it’s a viable solution.

Thanks,

— Ken

--------------------------------------------
+1 530-210-6378

Reply | Threaded
Open this post in threaded view
|

Re: Avoiding deadlock with iterations

Piotr Nowojski
Hi,

This is a known problem and I don’t think there is an easy solution to this. Please refer to the:

Thanks,
Piotrek

On 25 Jan 2018, at 05:36, Ken Krugler <[hidden email]> wrote:

Hi all,

We’ve run into deadlocks with two different streaming workflows that have iterations.

In both cases, the issue is with fan-out; if any operation in the loop can emit more records than consumed, eventually a network buffer fills up, and then everyone in the iteration loop is blocked.

One pattern we can use, when the operator that’s causing the fan-out has the ability to decide how much to emit, is to have it behave as an async function, emitting from a queue with multiple threads. If threads start blocking because of back pressure, then the queue begins to fill up, and the function can throttle back how much data it queues up. So this gives us a small (carefully managed) data reservoir we can use to avoid the deadlock.

Is there a better approach? I didn’t see any way to determine how “full” the various network buffers are, and use that for throttling. Plus there’s the issue of partitioning, where it would be impossible in many cases to know the impact of a record being emitted. So even if we could monitor buffers, I don’t think it’s a viable solution.

Thanks,

— Ken

--------------------------------------------
+1 530-210-6378