Logging Flink metrics

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Logging Flink metrics

Manish G
Hi,

Is it possible to log Flink metrics in application logs apart from publishing it to Prometheus?

With regards
Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards


Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards


Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards



Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards



Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards




Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards





Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards





Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards






Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0


On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards






Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



        

On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards







Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
In flink-conf.yaml:
metrics.reporter.prom.port: 9250-9260

This is based on information provided here
port - (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.

As I am running flink locally, so both jobmanager and taskmanager are colocated.

In prometheus.yml:
- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /

This is the whole configuration I have done based on several tutorials and blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[hidden email]> wrote:
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



        

On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards







Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
Are you running Flink is WSL by chance?

On 06/07/2020 19:06, Manish G wrote:
In flink-conf.yaml:
metrics.reporter.prom.port: 9250-9260

This is based on information provided here
port - (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.

As I am running flink locally, so both jobmanager and taskmanager are colocated.

In prometheus.yml:
- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /

This is the whole configuration I have done based on several tutorials and blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[hidden email]> wrote:
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards








Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
Yes.

On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <[hidden email]> wrote:
Are you running Flink is WSL by chance?

On 06/07/2020 19:06, Manish G wrote:
In flink-conf.yaml:
metrics.reporter.prom.port: 9250-9260

This is based on information provided here
port - (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.

As I am running flink locally, so both jobmanager and taskmanager are colocated.

In prometheus.yml:
- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /

This is the whole configuration I have done based on several tutorials and blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[hidden email]> wrote:
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards








Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Chesnay Schepler
WSL is a bit buggy when it comes to allocating ports; it happily lets 2 processes create sockets on the same port, except that the latter one doesn't do anything.
Super annying, and I haven't found a solution to that myself yet.

You'll have to configure the ports explicitly for the JM/TM, which will likely entail manually starting the processes and updating the configuration in-between, e.g.:

./bin/jobmanager.sh start
<update port in config>
./bin/taskmanager.sh start

On 06/07/2020 19:16, Manish G wrote:
Yes.

On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <[hidden email]> wrote:
Are you running Flink is WSL by chance?

On 06/07/2020 19:06, Manish G wrote:
In flink-conf.yaml:
metrics.reporter.prom.port: 9250-9260

This is based on information provided here
port - (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.

As I am running flink locally, so both jobmanager and taskmanager are colocated.

In prometheus.yml:
- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /

This is the whole configuration I have done based on several tutorials and blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[hidden email]> wrote:
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards









Reply | Threaded
Open this post in threaded view
|

Re: Logging Flink metrics

Manish G
Ok, got it.
I would try to do it manually.

Thanks a lot for your inputs and efforts.

With regards

On Mon, Jul 6, 2020 at 10:58 PM Chesnay Schepler <[hidden email]> wrote:
WSL is a bit buggy when it comes to allocating ports; it happily lets 2 processes create sockets on the same port, except that the latter one doesn't do anything.
Super annying, and I haven't found a solution to that myself yet.

You'll have to configure the ports explicitly for the JM/TM, which will likely entail manually starting the processes and updating the configuration in-between, e.g.:

./bin/jobmanager.sh start
<update port in config>
./bin/taskmanager.sh start

On 06/07/2020 19:16, Manish G wrote:
Yes.

On Mon, Jul 6, 2020 at 10:43 PM Chesnay Schepler <[hidden email]> wrote:
Are you running Flink is WSL by chance?

On 06/07/2020 19:06, Manish G wrote:
In flink-conf.yaml:
metrics.reporter.prom.port: 9250-9260

This is based on information provided here
port - (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like 9250-9260.

As I am running flink locally, so both jobmanager and taskmanager are colocated.

In prometheus.yml:
- job_name: 'flinkprometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']
    metrics_path: /

This is the whole configuration I have done based on several tutorials and blogs available online.




On Mon, Jul 6, 2020 at 10:20 PM Chesnay Schepler <[hidden email]> wrote:
These are all JobManager metrics; have you configured prometheus to also scrape the task manager processes?

On 06/07/2020 18:35, Manish G wrote:
The metrics I see on prometheus is like:
# HELP flink_jobmanager_job_lastCheckpointRestoreTimestamp lastCheckpointRestoreTimestamp (scope: jobmanager_job)
# TYPE flink_jobmanager_job_lastCheckpointRestoreTimestamp gauge
flink_jobmanager_job_lastCheckpointRestoreTimestamp{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} -1.0
# HELP flink_jobmanager_job_numberOfFailedCheckpoints numberOfFailedCheckpoints (scope: jobmanager_job)
# TYPE flink_jobmanager_job_numberOfFailedCheckpoints gauge
flink_jobmanager_job_numberOfFailedCheckpoints{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0
# HELP flink_jobmanager_Status_JVM_Memory_Heap_Max Max (scope: jobmanager_Status_JVM_Memory_Heap)
# TYPE flink_jobmanager_Status_JVM_Memory_Heap_Max gauge
flink_jobmanager_Status_JVM_Memory_Heap_Max{host="localhost",} 1.029177344E9
# HELP flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count Count (scope: jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep)
# TYPE flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count gauge
flink_jobmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count{host="localhost",} 2.0
# HELP flink_jobmanager_Status_JVM_CPU_Time Time (scope: jobmanager_Status_JVM_CPU)
# TYPE flink_jobmanager_Status_JVM_CPU_Time gauge
flink_jobmanager_Status_JVM_CPU_Time{host="localhost",} 8.42E9
# HELP flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: jobmanager_Status_JVM_Memory_Direct)
# TYPE flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity{host="localhost",} 604064.0
# HELP flink_jobmanager_job_fullRestarts fullRestarts (scope: jobmanager_job)
# TYPE flink_jobmanager_job_fullRestarts gauge
flink_jobmanager_job_fullRestarts{job_id="58483036154d7f72ad1bbf10eb86bc2e",host="localhost",job_name="frauddetection",} 0.0



On Mon, Jul 6, 2020 at 9:51 PM Chesnay Schepler <[hidden email]> wrote:
You've said elsewhere that you do see some metrics in prometheus, which are those?

Why are you configuring the host for the prometheus reporter? This option is only for the PrometheusPushGatewayReporter.

On 06/07/2020 18:01, Manish G wrote:
Hi,

So I have following in flink-conf.yml :
//////////////////////////////////////////////////////
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.host: 127.0.0.1
metrics.reporter.prom.port: 9999
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS
//////////////////////////////////////////////////////

And while I can see custom metrics in Taskmanager logs, but prometheus dashboard logs doesn't show custom metrics.

With regards

On Mon, Jul 6, 2020 at 8:55 PM Chesnay Schepler <[hidden email]> wrote:
You have explicitly configured a reporter list, resulting in the slf4j reporter being ignored:

2020-07-06 13:48:22,191 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2020-07-06 13:48:23,203 INFO  org.apache.flink.runtime.metrics.ReporterSetup                - Excluding reporter slf4j, not configured in reporter list (prom).

Note that nowadays metrics.reporters is no longer required; the set of reporters is automatically determined based on configured properties; the only use-case is disabling a reporter without having to remove the entire configuration.
I'd suggest to just remove the option, try again, and report back.

On 06/07/2020 16:35, Chesnay Schepler wrote:
Please enable debug logging and search for warnings from the metric groups/registry/reporter.

If you cannot find anything suspicious, you can also send the foll log to me directly.

On 06/07/2020 16:29, Manish G wrote:
Job is an infinite streaming one, so it keeps going. Flink configuration is as:

metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter
metrics.reporter.slf4j.interval: 30 SECONDS



On Mon, Jul 6, 2020 at 7:57 PM Chesnay Schepler <[hidden email]> wrote:
How long did the job run for, and what is the configured interval?


On 06/07/2020 15:51, Manish G wrote:
Hi,

Thanks for this.

I did the configuration as mentioned at the link(changes in flink-conf.yml, copying the jar in lib directory), and registered the Meter with metrics group and invoked markEvent() method in the target code. But I don't see any related logs.
I am doing this all on my local computer.

Anything else I need to do?

With regards
Manish

On Mon, Jul 6, 2020 at 5:24 PM Chesnay Schepler <[hidden email]> wrote:
Have you looked at the SLF4J reporter?

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter

On 06/07/2020 13:49, Manish G wrote:
> Hi,
>
> Is it possible to log Flink metrics in application logs apart from
> publishing it to Prometheus?
>
> With regards