The reported exception looks quite
similar to the one in
this
thread, which was supposedly caused by Datadog rate limits
but I don't think this was thoroughly investigated.
(bear in mind that each container has
its own reporter; with the default reporting interval of 10
seconds you quickly reach fairly high reports/second rates)
Alternatively it could just be plain
connectivity issues.
If the issues do not persist for a long
time then no metrics should be lost however, so you may
be able to ignore them.
On 2/2/2021 7:31 PM, Claude M wrote:
Hello,
I have a Flink jobmanager and taskmanagers deployed in a
Kubernetes cluster. I integrated it with Datadog by having
the following specified in the flink-conf.yaml.
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: <DD_API_KEY>
However, I'm seeing random timeouts in the log and don't
know why this is occurring and how to solve the issue.
Please see attached file showing the error.
Thanks