Hi, I need to get a deeper dive into how rocks is performing so I turned on Rocks Native Metrics. However, I don't see any of the metrics in DataDog (though I have other Flink metrics in DataDog). I only see rocks metrics in the operator metrics in Flink UI, and unfortunately I can't really zoom in or out of those metrics or compare against multiple operators at a time which makes it really difficult to get an overview of how rocks is doing. Is this there any way to get the Rocks Native Metrics forwarded over to DataDog? Thanks! |
Anything metric that is shown in the
Flink UI should also appear in DataDog.
If this is not the case then something
goes wrong within the reporter.
Is there anything suspicious in the
Flink logs?
Can you give some example of metrics
that do show up in DataDog?
On 1/26/2021 6:32 PM, Rex Fenley wrote:
|
All taskmanager and jobmanager logs show up. Anything specific to an operator does not. For example, flink.taskmanager.Status.JVM.Memory.Heap.Used shows up, but I can't see stats on an individual operator. I mostly followed a combination of https://docs.datadoghq.com/integrations/flink/#metric-collection and https://ci.apache.org/projects/flink/flink-docs-release-1.11/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter since datadog's documentation was slightly out of date. Thanks On Tue, Jan 26, 2021 at 10:28 AM Chesnay Schepler <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
It is good to know that something from
the task executors arrives at datadog.
Do you see any metrics for a specific
job, like the numRestarts metric of
the JobManager?
Are you using the default scope
formats, or have you modified them?
Could you try these instead and report
back? (I replaced all job/task/operator names with their IDs, in
case some special character is messing with datadog)
metrics.scope.jm:
<host>.jobmanager
metrics.scope.jm.job: <host>.jobmanager.<job_id> metrics.scope.tm: <host>.taskmanager.<tm_id> metrics.scope.tm.job: <host>.taskmanager.<tm_id>.<job_id> metrics.scope.task: <host>.taskmanager.<tm_id>.<job_id>.<task_id>.<subtask_index> metrics.scope.operator: <host>.taskmanager.<tm_id>.<job_id>.<operator_id>.<subtask_index> On 1/26/2021 9:28 PM, Rex Fenley wrote:
|
Oddly, I'm seeing them now. I'm not sure what has changed. Fwiw, we have modified the scopes per https://docs.datadoghq.com/integrations/flink/#metric-collection but their modifications ids as tags. We do need to modify them according to that documentation - "Note: The system scopes must be remapped for your Flink metrics to be supported, otherwise they are submitted as custom metrics." Could we instead add host and ids as tags to our metrics? Thanks for your help! On Tue, Jan 26, 2021 at 2:49 PM Chesnay Schepler <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
AFAIK all IDs (and in fact all
variables except <host>) are exposed as tags. (the
<host> is transmitted separately and I would've though
Datadog automatically provides similar functionality for it).
On 1/27/2021 2:11 AM, Rex Fenley wrote:
|
Free forum by Nabble | Edit this page |