Hi,
I have built the Prometheus reporter package from this PR https://github.com/apache/flink/pull/4586, and used it on Flink 1.3.2 to record every default metrics and those from `FlinkKafkaConsumer`. Originally, everything was fine. I could get those metrics in TM from Prometheus just like I saw on Flink Web UI. However, when I turned to JM, I found Prometheus gives this error to me: Get http://localhost:9249/metrics: EOF. I checked the log on JM and saw nothing in it. There was no error message and 9249 port was still alive. To figure out what happened, I created another cluster and I found Prometheus could connect to Flink cluster if there is no running job. After JM triggered or completed the first checkpoint, Prometheus started getting ERR_EMPTY_RESPONSE from JM, but not for TM. There was still no error in log file and 9249 port was still alive. I was wondering where did the error occur. Flink or Prometheus reporter? Or It is incorrect to use Prometheus reporter on Flink 1.3.2 ? Thank you. Best Regards, Tony Wei |
The Prometheus reporter should work
with 1.3.2.
Does this also occur with the reporter that currently exists in 1.4? (to rule out new bugs from the PR). To investigate this further, please set the logging level to WARN and try again, as all errors in the metric system are logged on that level. On 22.09.2017 10:33, Tony Wei wrote:
|
Hi Chesnay, I didn't try it in 1.4, so I have no idea if this also occurs in 1.4. For my setting for logging, It have already set to INFO level, but there wasn't any error or warning in log file as well. Best Regards, Tony Wei 2017-09-22 22:07 GMT+08:00 Chesnay Schepler <[hidden email]>:
|
Hi Chesnay, I built another flink cluster using version 1.4, set the log level to DEBUG, and I found that the root cause might be this exception: java.lang. I updated `CheckpointStatsTracker` to ignore external path when it is null, and this exception didn't happen again. The prometheus reporter works as well. I have created a Jira issue for it: https://issues.apache.org/jira/browse/FLINK-7675, and I will submit the PR after I passed Travis CI for my repository. Best Regards, Tony Wei 2017-09-22 22:20 GMT+08:00 Tony Wei <[hidden email]>:
|
Hi Tony,
thanks for troubleshooting this. I have added a commit to https://github.com/apache/flink/pull/4586 that should enable you to use the reporter with 1.3.2 as well. Best regards, Max
signature.asc (602 bytes) Download Attachment |
Hi Max, Good to know. Thanks very much. Best Regards, Tony Wei 2017-10-24 13:52 GMT+08:00 Maximilian Bode <[hidden email]>:
|
Free forum by Nabble | Edit this page |