backpressure metrics

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

backpressure metrics

Steven Wu

Flink has two backpressure related metrics: lastCheckpointAlignmentBuffered” and “checkpointAlignmentTime”. But they seems to always report zero. Similar thing in web UI, “Buffered During Alignment” always shows zero, even backpressure testing shows high backpressure for some operators. Has anyone else seen similar problem?

We are running flink 1.4.0 with some cherry-picked fixes. there was a bug and fix for 1.5 and above, which shouldn't affect us

Thanks,
Steven
Reply | Threaded
Open this post in threaded view
|

Re: backpressure metrics

Nagarjun Guraja
Hi Steven,

The metric 'Buffered During Alignment' you are talking about will always be zero when the job is run in ATLEAST_ONCE mode. Is that the case with your job? My understanding is, backpressure can only be monitored by sampling thread stacktraces and interpreting the situation based on the contention for network buffers on demand. 

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts. 
- Winston Churchill - 


On Wed, Nov 21, 2018 at 1:50 PM Steven Wu <[hidden email]> wrote:

Flink has two backpressure related metrics: lastCheckpointAlignmentBuffered” and “checkpointAlignmentTime”. But they seems to always report zero. Similar thing in web UI, “Buffered During Alignment” always shows zero, even backpressure testing shows high backpressure for some operators. Has anyone else seen similar problem?

We are running flink 1.4.0 with some cherry-picked fixes. there was a bug and fix for 1.5 and above, which shouldn't affect us

Thanks,
Steven
Reply | Threaded
Open this post in threaded view
|

Re: backpressure metrics

Steven Wu
Nargarjun, thanks a lot for the reply, which makes sense to me. Yes, we are running with AT_LEAST_ONCE mode.

On Wed, Nov 21, 2018 at 3:19 PM Nagarjun Guraja <[hidden email]> wrote:
Hi Steven,

The metric 'Buffered During Alignment' you are talking about will always be zero when the job is run in ATLEAST_ONCE mode. Is that the case with your job? My understanding is, backpressure can only be monitored by sampling thread stacktraces and interpreting the situation based on the contention for network buffers on demand. 

Regards,
Nagarjun

Success is not final, failure is not fatal: it is the courage to continue that counts. 
- Winston Churchill - 


On Wed, Nov 21, 2018 at 1:50 PM Steven Wu <[hidden email]> wrote:

Flink has two backpressure related metrics: lastCheckpointAlignmentBuffered” and “checkpointAlignmentTime”. But they seems to always report zero. Similar thing in web UI, “Buffered During Alignment” always shows zero, even backpressure testing shows high backpressure for some operators. Has anyone else seen similar problem?

We are running flink 1.4.0 with some cherry-picked fixes. there was a bug and fix for 1.5 and above, which shouldn't affect us

Thanks,
Steven