(DEPRECATED) Apache Flink User Mailing List archive.

Sample Code for querying Flink's default metrics

Classic

List

Threaded

6 messages Options

Pankaj Chand

Sample Code for querying Flink's default metrics

Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj

vino yang

Re: Sample Code for querying Flink's default metrics

Hi Pankaj,

> Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

> Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2]

[1]: https://flink.apache.org/2019/06/05/flink-network-stack.html

[2]: https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Best,

Vino

Pankaj Chand <[hidden email]> 于2019年12月9日周一下午12:07写道：

Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj

Pankaj Chand

Re: Sample Code for querying Flink's default metrics

Hi Vino,

Thank you for the links regarding backpressure!

I am currently using code to get metrics by calling REST API via curl. However, many times the REST API via curl gives an empty JSON object/array. Piped through JQ (for filtering JSON) it produces a null value. This is breaking my code.

Example in a Yarn cluster session mode, the following metric "metrics?get=Status.JVM.CPU.Load" randomly (I think) returns an empty json object/array or an actual value.

Is it possible that for CPU Load, the empty JSON object is returned when the job is newly started less than 10 seconds ago.

Thanks,

Pankaj

On Mon, Dec 9, 2019 at 4:21 AM vino yang <[hidden email]> wrote:

Hi Pankaj,

> Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

> Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2]

[1]: https://flink.apache.org/2019/06/05/flink-network-stack.html
[2]: https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Best,
Vino

Pankaj Chand <[hidden email]> 于2019年12月9日周一下午12:07写道：
Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj

Chesnay Schepler

Re: Sample Code for querying Flink's default metrics

Yes, when a cluster was started it takes a few seconds for (any) metrics to be available.

On 12/12/2019 11:36, Pankaj Chand wrote:

Hi Vino,

Thank you for the links regarding backpressure!

I am currently using code to get metrics by calling REST API via curl. However, many times the REST API via curl gives an empty JSON object/array. Piped through JQ (for filtering JSON) it produces a null value. This is breaking my code.

Example in a Yarn cluster session mode, the following metric "metrics?get=Status.JVM.CPU.Load" randomly (I think) returns an empty json object/array or an actual value.

Is it possible that for CPU Load, the empty JSON object is returned when the job is newly started less than 10 seconds ago.

Thanks,

Pankaj

On Mon, Dec 9, 2019 at 4:21 AM vino yang <[hidden email]> wrote:

Hi Pankaj,

> Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

> Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2]

[1]: https://flink.apache.org/2019/06/05/flink-network-stack.html

[2]: https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Best,

Vino

Pankaj Chand <[hidden email]> 于2019年12月9日周一下午12:07写道：

Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj

Pankaj Chand

Re: Sample Code for querying Flink's default metrics

Thank you, Chesnay!

On Thu, Dec 12, 2019 at 5:46 AM Chesnay Schepler <[hidden email]> wrote:

Yes, when a cluster was started it takes a few seconds for (any) metrics to be available.

On 12/12/2019 11:36, Pankaj Chand wrote:

Hi Vino,

Thank you for the links regarding backpressure!

I am currently using code to get metrics by calling REST API via curl. However, many times the REST API via curl gives an empty JSON object/array. Piped through JQ (for filtering JSON) it produces a null value. This is breaking my code.

Example in a Yarn cluster session mode, the following metric "metrics?get=Status.JVM.CPU.Load" randomly (I think) returns an empty json object/array or an actual value.

Is it possible that for CPU Load, the empty JSON object is returned when the job is newly started less than 10 seconds ago.

Thanks,

Pankaj

On Mon, Dec 9, 2019 at 4:21 AM vino yang <[hidden email]> wrote:

Hi Pankaj,

> Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

> Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2]

[1]: https://flink.apache.org/2019/06/05/flink-network-stack.html

[2]: https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Best,

Vino

Pankaj Chand <[hidden email]> 于2019年12月9日周一下午12:07写道：

Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj

Pankaj Chand

Re: Sample Code for querying Flink's default metrics

Additionally, when an old job completes and I run a new job on the Flink Yarn session mode cluster, when I query for metrics before they become available for the new job, I sometimes get the last metrics for the old job instead. This happens even if I wait for the TaskManager to be released by Flink (as shown in the Flink's dashboard Web UI).

This shouldn't happen since the Task_Manager ID "should" be different, though it would have the old index in the Task_Managers list.

Would this be a bug?

Thanks!

Pankaj

On Thu, Dec 12, 2019 at 5:59 AM Pankaj Chand <[hidden email]> wrote:

Thank you, Chesnay!

On Thu, Dec 12, 2019 at 5:46 AM Chesnay Schepler <[hidden email]> wrote:

Yes, when a cluster was started it takes a few seconds for (any) metrics to be available.

On 12/12/2019 11:36, Pankaj Chand wrote:

Hi Vino,

Thank you for the links regarding backpressure!

I am currently using code to get metrics by calling REST API via curl. However, many times the REST API via curl gives an empty JSON object/array. Piped through JQ (for filtering JSON) it produces a null value. This is breaking my code.

Example in a Yarn cluster session mode, the following metric "metrics?get=Status.JVM.CPU.Load" randomly (I think) returns an empty json object/array or an actual value.

Is it possible that for CPU Load, the empty JSON object is returned when the job is newly started less than 10 seconds ago.

Thanks,

Pankaj

On Mon, Dec 9, 2019 at 4:21 AM vino yang <[hidden email]> wrote:

Hi Pankaj,

> Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

> Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

The backpressure is related to not only memory metrics but also IO and network metrics, for more details about measure backpressure please see this blog.[1][2]

[1]: https://flink.apache.org/2019/06/05/flink-network-stack.html

[2]: https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Best,

Vino

Pankaj Chand <[hidden email]> 于2019年12月9日周一下午12:07写道：

Hello,

Using Flink on Yarn, I could not understand the documentation for how to read the default metrics via code. In particular, I want to read throughput, i.e. CPU usage, Task/Operator's numRecordsOutPerSecond, and Memory.

Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters?

Additionally, how do I query Backpressure using code, or is it still only visually available via the dashboard UI? Consequently, is there any way to infer Backpressure by querying one (or more) of the Memory metrics of the TaskManager?

Thank you,

Pankaj