Nargarjun, thanks a lot for the reply, which makes sense to me. Yes, we are running with AT_LEAST_ONCE mode.
On Wed, Nov 21, 2018 at 3:19 PM Nagarjun Guraja <
[hidden email]> wrote:
Hi Steven,
The metric 'Buffered During Alignment' you are talking about will always be zero when the job is run in ATLEAST_ONCE mode. Is that the case with your job? My understanding is, backpressure can only be monitored by sampling thread stacktraces and interpreting the situation based on the contention for network buffers on demand.
Regards,
Nagarjun
Success is not final, failure is not fatal: it is the courage to continue that counts.
- Winston Churchill -
Flink has two backpressure related metrics:
“lastCheckpointAlignmentBuffered” and “checkpointAlignmentTime”. But they seems to always report zero. Similar thing in web UI, “Buffered During Alignment” always shows zero, even backpressure testing shows high backpressure for some operators. Has anyone else seen similar problem?
We are running flink 1.4.0 with some cherry-picked fixes. there was a bug and fix for 1.5 and above, which shouldn't affect us
Thanks,
Steven