Hello, I am doing some tests with flink 1.11.1 and I have noticed something strange/wrong going on with the exported metrics. I have a configuration like such: metrics.reporter.graphite.class: org.apache.flink.metrics.graphite.GraphiteReporterFactory metrics.reporter.graphite.host: graphite metrics.reporter.graphite.port: 8080 metrics.reporter.graphite.protocol: tcp metrics.reporter.graphite.interval: 10 SECONDS which should produce metrics to graphite every 10 seconds. And that works with low parallelism (e.g. <= 20). Then we get all metrics, all the time, every 10th second. However, when I scale my job to 200 parallelism or more, the metrics are not sent every 10 seconds. Sometimes they are missing for up to 3 reporting cycles. I have had a brief look in the code here: https://github.com/apache/flink/blob/release-1.11.1/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L107-L144 and it looks like there is a separate thread. That was my first guess, if it is doing too much work on the same thread. I have tried lowering the reporting interval from 10 SECONDS to 6-7 SECONDS, but even in that case there will be missing metrics. Even for simpler jobs such as "source -> map -> sink" with higher parallelism that would happen. What can I do to further debug/make this work? Has anyone come across this before? Regards ,Nikola Hrusov |
IIRC this can be caused by the Carbon MAX_CREATES_PER_MINUTE
setting.
I would deem it unlikely that the
reporter thread is busy for 30 seconds.
On 11/08/2020 16:57, Nikola Hrusov
wrote:
|
Free forum by Nabble | Edit this page |