(DEPRECATED) Apache Flink User Mailing List archive.

Flink Datadog Timeout

Classic

List

Threaded

2 messages Options

Claude Murad

Flink Datadog Timeout

Hello,

I have a Flink jobmanager and taskmanagers deployed in a Kubernetes cluster. I integrated it with Datadog by having the following specified in the flink-conf.yaml.

metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: <DD_API_KEY>

However, I'm seeing random timeouts in the log and don't know why this is occurring and how to solve the issue. Please see attached file showing the error.

Thanks

FlinkDatadogTimeout.txt (3K) Download Attachment

Chesnay Schepler

Re: Flink Datadog Timeout

The reported exception looks quite similar to the one in this thread, which was supposedly caused by Datadog rate limits but I don't think this was thoroughly investigated.

(bear in mind that each container has its own reporter; with the default reporting interval of 10 seconds you quickly reach fairly high reports/second rates)

Alternatively it could just be plain connectivity issues.

If the issues do not persist for a long time then no metrics should be lost however, so you may be able to ignore them.

On 2/2/2021 7:31 PM, Claude M wrote:

Hello,

I have a Flink jobmanager and taskmanagers deployed in a Kubernetes cluster. I integrated it with Datadog by having the following specified in the flink-conf.yaml.

metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: <DD_API_KEY>

However, I'm seeing random timeouts in the log and don't know why this is occurring and how to solve the issue. Please see attached file showing the error.

Thanks