Analysis of CPU usage metrics!-LINUX-php.cn

Yes, what I’m talking about here is the “%CPU” metric that everyone uses everywhere, in every performance monitoring product. Use the top(1) command to view it.
You might think that 90% CPU usage means:
Analysis of CPU usage metrics!
And actually it might mean:

Stalled means that the processor is making no progress in processing instructions, usually because the processor is waiting for memory input/output. The ratio I drew above (between busy and stuck) is what I see often in real production environments. It's possible that you're basically at a standstill and don't even know it.

What does this mean to you? Knowing how much of your CPU is stalled can guide performance tuning efforts between reducing code or reducing memory I/O. Anyone who is concerned about CPU performance, especially in a cloud that automatically scales resources based on CPU, would benefit from knowing where %CPU is stalling.

What is CPU usage actually?

The measure we call CPU usage is actually "non-idle time": that is, the time the CPU is not running idle threads. Your operating system kernel (whatever it is) typically tracks this metric during context switches. If a non-idle process starts running and then stops for 100 milliseconds, the kernel still considers the CPU to be in use for that entire period.

The history of this metric is as old as the time-sharing system. The Apollo Lunar Module guidance computer, a pioneering time-sharing system, called its idle thread "DUMMY JOB." Engineers tracked the cycles it took to run that idle thread versus the cycles it took to run the actual task as a measure of the computer's An important indicator of usage.

So what’s wrong with this indicator?

Nowadays, the speed of CPU has become much faster than main memory, and waiting memory accounts for the majority of what is still called "CPU usage". If you see a high %CPU number, you might think that the processor is the bottleneck (i.e. the CPU package under the heatsink and fan), when in fact those DRAM modules are the bottleneck.

The situation in this regard has been becoming more and more serious. For a long time, processor manufacturers have increased clock speeds more than DRAM has increased access latencies. This is the so-called "CPU DRAM gap". This situation stabilized around 2005, when 3 GHz processors were introduced; since then, processors have used more cores and hyper-threading to improve performance, and used multi-socket configurations, all of which have placed higher demands on the memory subsystem. requirements. Processor manufacturers are trying to alleviate this memory bottleneck by using larger and smarter CPU caches and faster memory bus and interconnect technologies. But we are still generally at a standstill.

How to indicate what the CPU is processing?

Might as well use Performance Monitoring Counters (PMC): This is a hardware counter that can be read using Linux perf and other tools. For example, measuring the entire system for 10 seconds:

# perf stat -a — sleep 10
Performance counter stats for ‘system wide’:
641398.723351      task-clock (msec)         #  64.116 CPUs utilized         (100.00%)
379,651      context-switches          #    0.592 K/sec                 (100.00%)
51,546      cpu-migrations           #    0.080 K/sec                 (100.00%)
13,423,039       page-faults              #    0.021 M/sec
1,433,972,173,374      cycles                  #    2.236 GHz                  (75.02%)
<not>      stalled-cycles-frontend
<not>      stalled-cycles-backend
1,118,336,816,068      instructions              #    0.78  insns per cycle          (75.01%)
249,644,142,804       branches               #   389.218 M/sec                (75.01%)
7,791,449,769       branch-misses            #  3.12% of all branches          (75.01%)
10.003794539 seconds time elapsed</not></not>

Copy after login

A key metric here is instructions per cycle (i.e. IPC), which shows how many instructions we complete on average per CPU clock cycle. Simply put, the higher the value, the better. The 0.78 in the example above sounds good (busy 78% of the time); but not when you realize that the IPC at the processor's top speed is 4.0. This is also called 4-wide, which refers to the instruction fetch/decode path. This means that the CPU can retire (complete) four instructions per clock cycle. So, an IPC of 0.78 on a 4-wide system means the CPU is running at 19.5% of its maximum speed. The new Intel Skylake processors are 5-wide.

There are hundreds more PMCs you can use to delve further: you can directly measure stagnant periods by different types.

In the clouds

如果你在虚拟环境中，可能无法访问PMC，这要看虚拟机管理程序是否为访客（guest）支持PMC。我最近写过一篇文章：《EC2的PMC：测量IPC》，表明了如今PMC如何可用于基于Xen的AWS EC2云上面的专用主机类型。

实际对策

如果你的IPC

如果你的IPC > 1.0，你可能是指令密集型。想方设法减少代码执行：消除不必要的工作和缓存操作等。CPU火焰图是一款很适合开展这项调查的工具。至于硬件调优，不妨试一试更快的时钟频率和数量更多的核心/超线程。

性能监测产品应该能告诉你什么？

每一款性能工具应该显示IPC以及%CPU。或者将%CPU分解成指令完成周期与停滞周期，比如%INS和%STL。

面向Linux的tiptop(1)可按进程显示IPC：

tiptop –                 [root]
Tasks: 96 total,    3 displayed                                 screen   0: default
 
PID [ %CPU] %SYS  P   Mcycle   Minstr  IPC %MISS %BMIS  %BUS COMMAND
3897   35.3   28.5    4   274.06   178.23 0.65   0.06  0.00   0.0     java
1319+   5.5    2.6   6    87.32   125.55 1.44   0.34  0.26  0.0    nm-applet
900    0.9  0.0    6    25.91    55.55 2.14   0.12  0.21     0.0     dbus-daemo

Copy after login

CPU使用率具有误导性的其他理由

让CPU使用率具有误导性的不仅仅是内存停滞周期。其他因素包括如下：

温度过高导致处理器停滞。
睿频加速（Turboboost）导致时钟频率不一。
内核因speedstep导致时钟频率不一。
平均值方面的问题：1分钟内的使用率为80%，隐藏了100%的突发使用率。
自旋锁：CPU被使用，有很高的IPC，但是应用程序在处理指令方面没有合理的进展。

结束语

CPU使用率已成为一个极具误导性的度量指标：它包括了等待主内存的周期，而这类周期在现代工作负载中占了大头。如果使用额外的度量指标，你就能搞清楚%CPU到底意味着什么，包括每个周期指令（IPC）。IPC 1.0可能意味着指令密集型。我在之前的一篇文章中介绍了IPC，包括介绍了衡量IPC所需要的性能监控计数器（PMC）。显示%CPU的性能监控产品还应该显示PMC度量指标，解释那个值意味着什么，那样才不会误导最终用户。比如说，它们可以一并显示%CPU及IPC，以及/或指令完成周期与停滞周期。有了这些度量指标，开发人员和操作人员才能决定如何才能更好地调优应用程序和系统。

The above is the detailed content of Analysis of CPU usage metrics!. For more information, please follow other related articles on the PHP Chinese website!