Hi all,
I have a question regarding the Monitoring REST API; I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document) From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information of read/write records and read/write bytes. Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running my program on a cluster with up to 32 nodes. Where can I find the values for e.g. CPU or Network? Thanks in advance! Lydia |
Hi Lydia,
I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average over multiple nodes. 1)So for each node you can collect the sar output, and obtain for example: 12:54:09 CPU %user %nice %system %iowait %steal %idle 12:54:10 all 4.63 0.00 3.25 0.13 0.00 91.99 12:54:09 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact 12:54:10 129538812 2525308 1.91 1292 85876 3662636 2.69 2111652 55132 12:54:09 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:54:10 sda 28.71 2708.91 87.13 97.38 0.03 1.10 0.97 2.77 12:54:09 IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 12:54:10 eth0 632.67 587.13 3173.60 58.47 0.00 0.00 0.00 2) Calculate the average over your nodes (sync clocks) and obtain a final output over which you run some plot scripts: LINE DATE FILENAME CPU_user CPU_SYS KBMEMFREE KBMEMUSED MEMUSED DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs _IO_WSTs 1 12:54:10 res1Avg 6.12 1.25 129554704 2509412 1.90 6.00 4253.63 87.04 3944.00 88.00 2 12:54:11 res1Avg 3.41 0.28 129523432 2540690 1.92 4.00 2335.82 51.62 2692.00 0.00 3 12:54:12 res1Avg 0.06 0.03 129522000 2542120 1.92 1.60 0.16 0.59 2048.00 32.00 4 12:54:13 res1Avg 0.09 0.06 129520936 2543182 1.92 0.60 0.19 0.59 2048.00 0.00 5 12:54:14 res1Avg 0.06 0.06 129518448 2545670 1.93 6.80 4.31 169.47 4044.00 16.00 For other metrics specific to Flink’s execution you may need to rely on various metrics Flink is currently exposing. Best, Ovidiu
|
In reply to this post by Lydia Ickler
Although Flink exposes some metrics in the API/UI, it probably only does that because it was easy to do and convenient for users. However, I don't think Flink is intended to be a complete monitoring solution for your cluster.
Instead, you should take a look at collectd https://collectd.org/ which is meant for monitoring OS-level metrics and has, for example, a Graphite plugin which you can use to write to a Graphite server or statsd instance…
or you can integrate it some other way depending on what you have & what you want.
-Shannon
From: Lydia Ickler <[hidden email]>
Date: Wednesday, December 21, 2016 at 12:55 PM To: <[hidden email]> Subject: Monitoring REST API
Hi all,
I have a question regarding the Monitoring REST API;
I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document)
From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information of read/write records and read/write bytes.
Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running my program on a cluster with up to 32 nodes.
Where can I find the values for e.g. CPU or Network?
Thanks in advance!
Lydia
|
Free forum by Nabble | Edit this page |