Singal task backpressure problem with Credit-based Flow Control

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Singal task backpressure problem with Credit-based Flow Control

HuWeihua
Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu

Reply | Threaded
Open this post in threaded view
|

Re: Singal task backpressure problem with Credit-based Flow Control

Zhijiang(wangzhijiang999)
Hi Weihua,

From your below info, it is with the expectation in credit-based flow control. 

I guess one of the sink parallelism causes the backpressure, so you will see that there are no available credits on Sink side and
the outPoolUsage of Map is almost 100%. It really reflects the credit-based states in the case of backpressure.

If you want to analyze the root cause of backpressure, you can trace the task stack of respective Sink parallelism to find which operation costs much,
then you can increase the parallelism or improve the UDF(if have bottleneck) to have a try. In addition, i am not sure why you choose rescale to shuffle data among operators. The default
forward mode can gain really good performance by default if you adjusting the same parallelism among them.

Best,
Zhijiang
------------------------------------------------------------------
From:Weihua Hu <[hidden email]>
Send Time:2020年5月24日(星期日) 18:32
To:user <[hidden email]>
Subject:Singal task backpressure problem with Credit-based Flow Control

Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu


Reply | Threaded
Open this post in threaded view
|

Re: Singal task backpressure problem with Credit-based Flow Control

HuWeihua
Hi, Zhijiang

I understand the normal credit-based backpressure mechanism. as usual the Sink inPoolUsage will be full, and the task stack will also have some information. 
but this time is not the same. The Sink inPoolUsage is 0. 
I also checked the stack. The Map is waiting org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment
The Sink is waiting data to deal, this is not very in line with expectations.






Best
Weihua Hu

2020年5月24日 21:57,Zhijiang <[hidden email]> 写道:

Hi Weihua,

From your below info, it is with the expectation in credit-based flow control. 

I guess one of the sink parallelism causes the backpressure, so you will see that there are no available credits on Sink side and
the outPoolUsage of Map is almost 100%. It really reflects the credit-based states in the case of backpressure.

If you want to analyze the root cause of backpressure, you can trace the task stack of respective Sink parallelism to find which operation costs much,
then you can increase the parallelism or improve the UDF(if have bottleneck) to have a try. In addition, i am not sure why you choose rescale to shuffle data among operators. The default
forward mode can gain really good performance by default if you adjusting the same parallelism among them.

Best,
Zhijiang
------------------------------------------------------------------
From:Weihua Hu <[hidden email]>
Send Time:2020年5月24日(星期日) 18:32
To:user <[hidden email]>
Subject:Singal task backpressure problem with Credit-based Flow Control

Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu



Reply | Threaded
Open this post in threaded view
|

Re: Singal task backpressure problem with Credit-based Flow Control

Piotr Nowojski-3
Hi Weihua,

> After dumping the memory and analyzing it, I found:
> Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
> Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
> This is not consistent with my understanding of the Flink network transmission mechanism.

It probably is consistent. Downstream receiver unannounced all of the credits, and it’s simply waiting for the data to arrive, while upstream sender is waiting for the data to be sent down the stream.

Stack trace you posted confirms that the sink you posted has empty input buffer - it’s waiting for input data. Assuming rescale partitoning works as expected and indeed node 242 is connected to node 121, it implies the bottleneck is your data exchange between those two tasks. It could be

- network bottleneck (slow network? Packet losses?)
- machine swapping/long GC pauses (If upstream node is experiencing long pauses it might show up like this)
- cpu bottleneck in the network stack (frequent flushing? SSL?)
- some resource competition (too high parallelism for given number of machines)
- netty threads are not keeping up

It’s hard to say what’s the problem without looking at the resource usage (CPU/Network/Memory/Disk IO), GC logs, code profiling results.

Piotrek

PS Zhijiang:

RescalePartitioner in this case should be connect just two upstream subtasks with one downstream sink. Upstream subtasks N and N+1 should be connected to sink with N/2 id.

On 25 May 2020, at 04:39, Weihua Hu <[hidden email]> wrote:

Hi, Zhijiang

I understand the normal credit-based backpressure mechanism. as usual the Sink inPoolUsage will be full, and the task stack will also have some information. 
but this time is not the same. The Sink inPoolUsage is 0. 
I also checked the stack. The Map is waiting org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment
The Sink is waiting data to deal, this is not very in line with expectations.


<粘贴的图形-2.tiff>

<粘贴的图形-1.tiff>



Best
Weihua Hu

2020年5月24日 21:57,Zhijiang <[hidden email]> 写道:

Hi Weihua,

From your below info, it is with the expectation in credit-based flow control. 

I guess one of the sink parallelism causes the backpressure, so you will see that there are no available credits on Sink side and
the outPoolUsage of Map is almost 100%. It really reflects the credit-based states in the case of backpressure.

If you want to analyze the root cause of backpressure, you can trace the task stack of respective Sink parallelism to find which operation costs much,
then you can increase the parallelism or improve the UDF(if have bottleneck) to have a try. In addition, i am not sure why you choose rescale to shuffle data among operators. The default
forward mode can gain really good performance by default if you adjusting the same parallelism among them.

Best,
Zhijiang
------------------------------------------------------------------
From:Weihua Hu <[hidden email]>
Send Time:2020年5月24日(星期日) 18:32
To:user <[hidden email]>
Subject:Singal task backpressure problem with Credit-based Flow Control

Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu




Reply | Threaded
Open this post in threaded view
|

Re: Singal task backpressure problem with Credit-based Flow Control

HuWeihua
Hi Piotrek,

Thanks for your suggestions, I found some network issues which seems to be the cause of back pressure.

Best
Weihua Hu

2020年5月26日 02:54,Piotr Nowojski <[hidden email]> 写道:

Hi Weihua,

> After dumping the memory and analyzing it, I found:
> Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
> Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
> This is not consistent with my understanding of the Flink network transmission mechanism.

It probably is consistent. Downstream receiver unannounced all of the credits, and it’s simply waiting for the data to arrive, while upstream sender is waiting for the data to be sent down the stream.

Stack trace you posted confirms that the sink you posted has empty input buffer - it’s waiting for input data. Assuming rescale partitoning works as expected and indeed node 242 is connected to node 121, it implies the bottleneck is your data exchange between those two tasks. It could be

- network bottleneck (slow network? Packet losses?)
- machine swapping/long GC pauses (If upstream node is experiencing long pauses it might show up like this)
- cpu bottleneck in the network stack (frequent flushing? SSL?)
- some resource competition (too high parallelism for given number of machines)
- netty threads are not keeping up

It’s hard to say what’s the problem without looking at the resource usage (CPU/Network/Memory/Disk IO), GC logs, code profiling results.

Piotrek

PS Zhijiang:

RescalePartitioner in this case should be connect just two upstream subtasks with one downstream sink. Upstream subtasks N and N+1 should be connected to sink with N/2 id.

On 25 May 2020, at 04:39, Weihua Hu <[hidden email]> wrote:

Hi, Zhijiang

I understand the normal credit-based backpressure mechanism. as usual the Sink inPoolUsage will be full, and the task stack will also have some information. 
but this time is not the same. The Sink inPoolUsage is 0. 
I also checked the stack. The Map is waiting org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment
The Sink is waiting data to deal, this is not very in line with expectations.


<粘贴的图形-2.tiff>

<粘贴的图形-1.tiff>



Best
Weihua Hu

2020年5月24日 21:57,Zhijiang <[hidden email]> 写道:

Hi Weihua,

From your below info, it is with the expectation in credit-based flow control. 

I guess one of the sink parallelism causes the backpressure, so you will see that there are no available credits on Sink side and
the outPoolUsage of Map is almost 100%. It really reflects the credit-based states in the case of backpressure.

If you want to analyze the root cause of backpressure, you can trace the task stack of respective Sink parallelism to find which operation costs much,
then you can increase the parallelism or improve the UDF(if have bottleneck) to have a try. In addition, i am not sure why you choose rescale to shuffle data among operators. The default
forward mode can gain really good performance by default if you adjusting the same parallelism among them.

Best,
Zhijiang
------------------------------------------------------------------
From:Weihua Hu <[hidden email]>
Send Time:2020年5月24日(星期日) 18:32
To:user <[hidden email]>
Subject:Singal task backpressure problem with Credit-based Flow Control

Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu





Reply | Threaded
Open this post in threaded view
|

Re: Singal task backpressure problem with Credit-based Flow Control

Piotr Nowojski-3
Hi Weihua,

Good to hear that you have found the problem. Let us know if you find some other problems after all.

Piotrek

On 27 May 2020, at 14:18, Weihua Hu <[hidden email]> wrote:

Hi Piotrek,

Thanks for your suggestions, I found some network issues which seems to be the cause of back pressure.

Best
Weihua Hu

2020年5月26日 02:54,Piotr Nowojski <[hidden email]> 写道:

Hi Weihua,

> After dumping the memory and analyzing it, I found:
> Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
> Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
> This is not consistent with my understanding of the Flink network transmission mechanism.

It probably is consistent. Downstream receiver unannounced all of the credits, and it’s simply waiting for the data to arrive, while upstream sender is waiting for the data to be sent down the stream.

Stack trace you posted confirms that the sink you posted has empty input buffer - it’s waiting for input data. Assuming rescale partitoning works as expected and indeed node 242 is connected to node 121, it implies the bottleneck is your data exchange between those two tasks. It could be

- network bottleneck (slow network? Packet losses?)
- machine swapping/long GC pauses (If upstream node is experiencing long pauses it might show up like this)
- cpu bottleneck in the network stack (frequent flushing? SSL?)
- some resource competition (too high parallelism for given number of machines)
- netty threads are not keeping up

It’s hard to say what’s the problem without looking at the resource usage (CPU/Network/Memory/Disk IO), GC logs, code profiling results.

Piotrek

PS Zhijiang:

RescalePartitioner in this case should be connect just two upstream subtasks with one downstream sink. Upstream subtasks N and N+1 should be connected to sink with N/2 id.

On 25 May 2020, at 04:39, Weihua Hu <[hidden email]> wrote:

Hi, Zhijiang

I understand the normal credit-based backpressure mechanism. as usual the Sink inPoolUsage will be full, and the task stack will also have some information. 
but this time is not the same. The Sink inPoolUsage is 0. 
I also checked the stack. The Map is waiting org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment
The Sink is waiting data to deal, this is not very in line with expectations.


<粘贴的图形-2.tiff>

<粘贴的图形-1.tiff>



Best
Weihua Hu

2020年5月24日 21:57,Zhijiang <[hidden email]> 写道:

Hi Weihua,

From your below info, it is with the expectation in credit-based flow control. 

I guess one of the sink parallelism causes the backpressure, so you will see that there are no available credits on Sink side and
the outPoolUsage of Map is almost 100%. It really reflects the credit-based states in the case of backpressure.

If you want to analyze the root cause of backpressure, you can trace the task stack of respective Sink parallelism to find which operation costs much,
then you can increase the parallelism or improve the UDF(if have bottleneck) to have a try. In addition, i am not sure why you choose rescale to shuffle data among operators. The default
forward mode can gain really good performance by default if you adjusting the same parallelism among them.

Best,
Zhijiang
------------------------------------------------------------------
From:Weihua Hu <[hidden email]>
Send Time:2020年5月24日(星期日) 18:32
To:user <[hidden email]>
Subject:Singal task backpressure problem with Credit-based Flow Control

Hi, all

I ran into a weird single Task BackPressure problem.

JobInfo:
    DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via rescale. 
    Flink version: 1.9.0
    
There is no related info in jobmanager/taskamanger log.

Through Metrics, I see that Map (242) 's outPoolUsage is full, but its downstream Sink (121)' s inPoolUsage is 0.

After dumping the memory and analyzing it, I found:
Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
This is not consistent with my understanding of the Flink network transmission mechanism.

Can someone help me? Thanks a lot.


Best
Weihua Hu