Hello,
Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory. Is there any sample code for how to read such default metrics?
Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?
Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager? Thank you, Pankaj |
Hi Pankaj, > Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters? What's your real requirement? Can you use code to call REST API? Why does it not match your requirements? > Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager? The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2] Best, Vino Pankaj Chand <[hidden email]> 于2019年12月9日周一 下午12:07写道:
|
Hi Vino, Thank you for the links regarding backpressure! I am currently using code to get metrics by calling REST API via curl. However, many times the REST API via curl gives an empty JSON object/array. Piped through JQ (for filtering JSON) it produces a null value. This is breaking my code. Example in a Yarn cluster session mode, the following metric "metrics?get=Status.JVM.CPU.Load" randomly (I think) returns an empty json object/array or an actual value. Is it possible that for CPU Load, the empty JSON object is returned when the job is newly started less than 10 seconds ago. Thanks, Pankaj On Mon, Dec 9, 2019 at 4:21 AM vino yang <[hidden email]> wrote:
|
Yes, when a cluster was started it
takes a few seconds for (any) metrics to be available.
On 12/12/2019 11:36, Pankaj Chand
wrote:
|
Thank you, Chesnay! On Thu, Dec 12, 2019 at 5:46 AM Chesnay Schepler <[hidden email]> wrote:
|
Additionally, when an old job completes and I run a new job on the Flink Yarn session mode cluster, when I query for metrics before they become available for the new job, I sometimes get the last metrics for the old job instead. This happens even if I wait for the TaskManager to be released by Flink (as shown in the Flink's dashboard Web UI). This shouldn't happen since the Task_Manager ID "should" be different, though it would have the old index in the Task_Managers list. Would this be a bug? Thanks! Pankaj On Thu, Dec 12, 2019 at 5:59 AM Pankaj Chand <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |