Combined streams backpressure

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Combined streams backpressure

Adam Venger
Hi.
I'm thinking about a solution to a problem I have. I need to create keyed session windows from multiple streams of data. Combining streams is done by watermarks. The problem is, one of the streams can be slower. This opens too many windows that wait for the stream to catch up, which wastes resources, slows down checkpointing and access. I was thinking about some way to detect this and stop reading from the faster streams which should create backpressure. Is it something that is possible now? If not, would there be any interest in adding this feature by me? If it would be a more complex problem, I would like to make a master thesis form it.

Adam
Reply | Threaded
Open this post in threaded view
|

Re: Combined streams backpressure

Piotr Nowojski-4
Hi,

This is a known problem. As of recently, there was no way to solve this problem generically, for every source. This is changing now, as one of the motivations behind FLIP-27, was to actually address this issue [1]. Note, as of now, there are no FLIP-27 sources yet in the code base, but for Flink 1.12 we are planning to change this.

For older Flink versions some users are monitoring this even time spread and either back pressuring or slowing down the faster sources. Either via some sleeping mapper, or via changing the source function itself (the latter is better, but more complicated).

If you would like to help this effort, I would first suggest to take a look at the FLIP-27 document and then try to get in touch with the people involved with it to align the efforts. It would be a much welcomed feature.

Best,
Piotrek


czw., 3 wrz 2020 o 15:28 Adam Venger <[hidden email]> napisał(a):
Hi.
I'm thinking about a solution to a problem I have. I need to create keyed session windows from multiple streams of data. Combining streams is done by watermarks. The problem is, one of the streams can be slower. This opens too many windows that wait for the stream to catch up, which wastes resources, slows down checkpointing and access. I was thinking about some way to detect this and stop reading from the faster streams which should create backpressure. Is it something that is possible now? If not, would there be any interest in adding this feature by me? If it would be a more complex problem, I would like to make a master thesis form it.

Adam