datadog http reporter metrics

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

datadog http reporter metrics

Yitzchak Lieberman
Hi.

Did someone encountered problem with sending metrics with datadog http reporter?
My setup is flink version 1.8.2 deployed on k8s with 1 job manager and 10 task managers.
Every version deploy I see metrics on my dashboard but after a few minutes its stopped being sent from all task managers while job manager still sends (with no error/warn on the logs).
Is it possible to be blocked by datadog due to the cluster size? my staging cluster with 3 servers sends without any problem.

Thanks in advance,
Yitzchak.
Reply | Threaded
Open this post in threaded view
|

Re: datadog http reporter metrics

Yitzchak Lieberman
Anyone?

On Wed, Mar 11, 2020 at 11:23 PM Yitzchak Lieberman <[hidden email]> wrote:
Hi.

Did someone encountered problem with sending metrics with datadog http reporter?
My setup is flink version 1.8.2 deployed on k8s with 1 job manager and 10 task managers.
Every version deploy I see metrics on my dashboard but after a few minutes its stopped being sent from all task managers while job manager still sends (with no error/warn on the logs).
Is it possible to be blocked by datadog due to the cluster size? my staging cluster with 3 servers sends without any problem.

Thanks in advance,
Yitzchak.
Reply | Threaded
Open this post in threaded view
|

Re: datadog http reporter metrics

Chesnay Schepler
Do you see anything in the logs? In another thread a user reported that the datadog reporter could stop working when faced with a large number of metrics since datadog was rejecting the report due to being too large.

On 15/03/2020 12:22, Yitzchak Lieberman wrote:
Anyone?

On Wed, Mar 11, 2020 at 11:23 PM Yitzchak Lieberman <[hidden email]> wrote:
Hi.

Did someone encountered problem with sending metrics with datadog http reporter?
My setup is flink version 1.8.2 deployed on k8s with 1 job manager and 10 task managers.
Every version deploy I see metrics on my dashboard but after a few minutes its stopped being sent from all task managers while job manager still sends (with no error/warn on the logs).
Is it possible to be blocked by datadog due to the cluster size? my staging cluster with 3 servers sends without any problem.

Thanks in advance,
Yitzchak.


Reply | Threaded
Open this post in threaded view
|

Re: [EXT.MSG] Re: datadog http reporter metrics

Yitzchak Lieberman
No, tried to find error/warn logs for rejected metrics, nothing...
tor that case there should be an error, right? (when report is too large)
I saw that there are some changes on version 1.10 for datadog reporter, maybe I should upgrade to this version?

On Mon, Mar 16, 2020 at 11:47 AM Chesnay Schepler <[hidden email]> wrote:
Do you see anything in the logs? In another thread a user reported that the datadog reporter could stop working when faced with a large number of metrics since datadog was rejecting the report due to being too large.

On 15/03/2020 12:22, Yitzchak Lieberman wrote:
Anyone?

On Wed, Mar 11, 2020 at 11:23 PM Yitzchak Lieberman <[hidden email]> wrote:
Hi.

Did someone encountered problem with sending metrics with datadog http reporter?
My setup is flink version 1.8.2 deployed on k8s with 1 job manager and 10 task managers.
Every version deploy I see metrics on my dashboard but after a few minutes its stopped being sent from all task managers while job manager still sends (with no error/warn on the logs).
Is it possible to be blocked by datadog due to the cluster size? my staging cluster with 3 servers sends without any problem.

Thanks in advance,
Yitzchak.


Reply | Threaded
Open this post in threaded view
|

Re: [EXT.MSG] Re: datadog http reporter metrics

Chesnay Schepler
It would only be logged when using 1.10 unfortunately; but you should be able to use the 1.10 version of the reporter with your version of Flink to at least confirm that it is the same issue as FLINK-16611.

On 16/03/2020 11:35, Yitzchak Lieberman wrote:
No, tried to find error/warn logs for rejected metrics, nothing...
tor that case there should be an error, right? (when report is too large)
I saw that there are some changes on version 1.10 for datadog reporter, maybe I should upgrade to this version?

On Mon, Mar 16, 2020 at 11:47 AM Chesnay Schepler <[hidden email]> wrote:
Do you see anything in the logs? In another thread a user reported that the datadog reporter could stop working when faced with a large number of metrics since datadog was rejecting the report due to being too large.

On 15/03/2020 12:22, Yitzchak Lieberman wrote:
Anyone?

On Wed, Mar 11, 2020 at 11:23 PM Yitzchak Lieberman <[hidden email]> wrote:
Hi.

Did someone encountered problem with sending metrics with datadog http reporter?
My setup is flink version 1.8.2 deployed on k8s with 1 job manager and 10 task managers.
Every version deploy I see metrics on my dashboard but after a few minutes its stopped being sent from all task managers while job manager still sends (with no error/warn on the logs).
Is it possible to be blocked by datadog due to the cluster size? my staging cluster with 3 servers sends without any problem.

Thanks in advance,
Yitzchak.