Monitor the Flink

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Monitor the Flink

penguin.

Hello,


In the Flink cluster,

How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots.

What is the way to get the traffic between nodes?

thank you very much!


penguin



 

Reply | Threaded
Open this post in threaded view
|

Re: Monitor the Flink

Yangze Guo
Hi,

First of all, there’s no resource isolation atm between
operators/tasks within a slot, except for managed memory. So,
monitoring of individual tasks might be meaningless.

Regarding TM/JM level cpu/memory metrics, you can refer to [1] and
[2]. Regarding the traffic between tasks, you can refer to [3].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#cpu
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#memory
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#default-shuffle-service

Best,
Yangze Guo

On Sun, Jan 17, 2021 at 6:43 PM penguin. <[hidden email]> wrote:

>
> Hello,
>
>
> In the Flink cluster,
>
> How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots.
>
> What is the way to get the traffic between nodes?
>
> thank you very much!
>
>
> penguin
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Monitor the Flink

Piotr Nowojski-4
Hi Penguin,

Building on top of Yangze's response, you can also take a look at the more detailed system resources usage [1] after adding an optional dependency to the class path/lib directory.

Regarding the single task/task slot metrics, as Yangze noted there is "almost" no isolation of the resources between Tasks (task slots). Almost, because there is one thing to note. Most of the Flink's Tasks are single threaded and you can actually monitor how busy is this single thread using `idleTimeMsPerSecond` metric [2] (which was added in Flink 1.11). In Flink 1.13 this metric will be changed a little bit, as it will be split into two `idleTimeMsPerSecond` and `backPressuredTimeMsPerSecond`. Additionally those two will be complemented with the `busyTimeMsPerSecond` metric [3][4][5]. And those metrics will be easily accessible in the WebUI [6].

I wrote "Most of the Flink's Tasks are single threaded" as there are a couple of caveats:
- network communication is done in a separate pool of threads
- old style sources (using `SourceFunction` primitive, so basically all sources apart of a couple of new ones introduced in Flink 1.12) are spawning another dedicated thread which is not monitored/covered by those busy/idle time metrics.
- if an operator or user code is spawning it's own threads somehow, those are also completely ignored (this includes the built in AsyncWaitOperator [7])

Best,
Piotrek


pon., 18 sty 2021 o 03:33 Yangze Guo <[hidden email]> napisał(a):
Hi,

First of all, there’s no resource isolation atm between
operators/tasks within a slot, except for managed memory. So,
monitoring of individual tasks might be meaningless.

Regarding TM/JM level cpu/memory metrics, you can refer to [1] and
[2]. Regarding the traffic between tasks, you can refer to [3].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#cpu
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#memory
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#default-shuffle-service

Best,
Yangze Guo

On Sun, Jan 17, 2021 at 6:43 PM penguin. <[hidden email]> wrote:
>
> Hello,
>
>
> In the Flink cluster,
>
> How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots.
>
> What is the way to get the traffic between nodes?
>
> thank you very much!
>
>
> penguin
>
>
>
>