EOF on scraping flink metrics

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

EOF on scraping flink metrics

Vishal Santoshi
A simple query, Is the route to /metrics execute an  access to an in memory registry of stats collected OR does it contend  with access from JM or do expensive access or computation. I see occasionally our Prometheus scrape  fail with  the error pasted below. We have had the scrapper do much more elaborate scrape on other systems we maintain so was curious. The server did not have any logs related to the exception and the scraper is . ServiceMonitor from k8s and of course these  TMs are hosted no k8s as well 

Reply | Threaded
Open this post in threaded view
|

Re: EOF on scraping flink metrics

Chesnay Schepler
Since you're using Prometheus I would recommend setting up a PrometheusReporter as described in the metrics documentation and scrape each JM/TM individually. Scraping through the REST API is more expensive and you loose out on a lot of features.
The REST API calls are primarily aimed at the WebUI.

Regardless, as of right now I would doubt that this is a Flink issue, and would recommend heading to the prometheus mailing lists.

On 22/03/2019 17:55, Vishal Santoshi wrote:
A simple query, Is the route to /metrics execute an  access to an in memory registry of stats collected OR does it contend  with access from JM or do expensive access or computation. I see occasionally our Prometheus scrape  fail with  the error pasted below. We have had the scrapper do much more elaborate scrape on other systems we maintain so was curious. The server did not have any logs related to the exception and the scraper is . ServiceMonitor from k8s and of course these  TMs are hosted no k8s as well 

Get http://10.246.254.84:9610/metrics: EOF


Reply | Threaded
Open this post in threaded view
|

Re: EOF on scraping flink metrics

Vishal Santoshi
Thank you, 

What might I be doing wrong ? 

metrics.reporters: prom
metrics.reporter.prom.port: 9610 . 
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter

and uses  ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
endpoints:
- port: metrics // named port exposed in the k8s service and is 9610
scheme: http
path: /metrics
interval: 60s
scrapeTimeout: 59s
selector:....

Regards.











On Fri, Mar 22, 2019 at 3:05 PM Chesnay Schepler <[hidden email]> wrote:
Since you're using Prometheus I would recommend setting up a PrometheusReporter as described in the metrics documentation and scrape each JM/TM individually. Scraping through the REST API is more expensive and you loose out on a lot of features.
The REST API calls are primarily aimed at the WebUI.

Regardless, as of right now I would doubt that this is a Flink issue, and would recommend heading to the prometheus mailing lists.

On 22/03/2019 17:55, Vishal Santoshi wrote:
A simple query, Is the route to /metrics execute an  access to an in memory registry of stats collected OR does it contend  with access from JM or do expensive access or computation. I see occasionally our Prometheus scrape  fail with  the error pasted below. We have had the scrapper do much more elaborate scrape on other systems we maintain so was curious. The server did not have any logs related to the exception and the scraper is . ServiceMonitor from k8s and of course these  TMs are hosted no k8s as well 

Get http://10.246.254.84:9610/metrics: EOF