Hello, In the Flink cluster, How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots. What is the way to get the traffic between nodes? thank you very much! penguin
|
Hi,
First of all, there’s no resource isolation atm between operators/tasks within a slot, except for managed memory. So, monitoring of individual tasks might be meaningless. Regarding TM/JM level cpu/memory metrics, you can refer to [1] and [2]. Regarding the traffic between tasks, you can refer to [3]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#cpu [2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#memory [3] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/metrics.html#default-shuffle-service Best, Yangze Guo On Sun, Jan 17, 2021 at 6:43 PM penguin. <[hidden email]> wrote: > > Hello, > > > In the Flink cluster, > > How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots. > > What is the way to get the traffic between nodes? > > thank you very much! > > > penguin > > > > |
Hi Penguin, Building on top of Yangze's response, you can also take a look at the more detailed system resources usage [1] after adding an optional dependency to the class path/lib directory. Regarding the single task/task slot metrics, as Yangze noted there is "almost" no isolation of the resources between Tasks (task slots). Almost, because there is one thing to note. Most of the Flink's Tasks are single threaded and you can actually monitor how busy is this single thread using `idleTimeMsPerSecond` metric [2] (which was added in Flink 1.11). In Flink 1.13 this metric will be changed a little bit, as it will be split into two `idleTimeMsPerSecond` and `backPressuredTimeMsPerSecond`. Additionally those two will be complemented with the `busyTimeMsPerSecond` metric [3][4][5]. And those metrics will be easily accessible in the WebUI [6]. I wrote "Most of the Flink's Tasks are single threaded" as there are a couple of caveats: - network communication is done in a separate pool of threads - old style sources (using `SourceFunction` primitive, so basically all sources apart of a couple of new ones introduced in Flink 1.12) are spawning another dedicated thread which is not monitored/covered by those busy/idle time metrics. - if an operator or user code is spawning it's own threads somehow, those are also completely ignored (this includes the built in AsyncWaitOperator [7]) Best, Piotrek pon., 18 sty 2021 o 03:33 Yangze Guo <[hidden email]> napisał(a): Hi, |
Free forum by Nabble | Edit this page |