Performance anomaly locating and performance monitoring on Linux-PHP开发-php.cn

Introduction: Most services run on Linux. Linux has now been widely used, but there are still many problems. Let’s discuss our performance monitoring indicators. Performance monitoring is nothing more than I/O, memory, CPU, number of TCP connections, network, process or thread. The commands used are iostat, vmstat, sar, mpstat, netstat, ss, iftop, free, pstree/ps, pidstat, top, ( uptime) let’s go into more detail below.

1. Disk I/O (iostat)

A lot of data on our machine is stored on the disk, and a lot of the data we read has to interact with the disk , but the disk is also a low-speed device and may be blocked in many cases, so disk I/O monitoring is very important. We use iostat to diagnose disk conditions. The machine used is Tencent Cloud host.

Performance anomaly locating and performance monitoring on Linux

ps: The number of transmissions per second of the device, indicating how many I/O requests per second

Blk_read/s: The amount of data read from the device per second

Blk_wrtn/s: To the device per second Amount of data written

Blk_read: Total amount of data read

Blk_wrtn: Total amount of data written

%user: Represents the CPU load used by user mode processes

%nice: Represents priority process use CPU load

%system: represents the CPU load used by the kernel state process

%iowait: represents the CPU load when the CPU is waiting for I/O

%steal: represents the stolen CPU load, this is used in virtualization Will be used in technology

%idle: represents the idle CPU load

iostat also has a commonly used parameter option -x, which represents extended information

Performance anomaly locating and performance monitoring on Linux

rrqm/s: related to this device per second How many read requests have been merged (multiple I/O merge operations)

wrqm/s: How many write requests related to this device have been merged per second

r/s: sent to per second The number of read requests of the device

w/s: the number of write requests sent to the device per second

rsec/s: the number of reads of device sectors per second

wsec/s: the number of writes to device sectors per second Times

avgrq-sz: average request sector size

avgqu-sz: average request queue length

await: average processing time (waiting time) of each I/O request

r_await: each read The average processing time of I/O requests

w_await: The average processing time of each write I/O request

svctm: Indicates the average service time of each I/O operation. If the svctm value and the await value are very close, it means that there is almost no waiting for I/O. If the await value is much higher than the svctm value, it means that the I/O queue wait is too long.

%util: There are a total of How much time is spent processing I/O operations, that is, the percentage of CPU consumed. For example, the statistical time interval is 1s, then the device is processing I/O for 0.65s and idle for 0.35s. Then the %util=0.65/1=65% of this device. Generally, if this parameter is 100%, it means that the device is running close to full capacity (of course, if there are multiple disks, even if %util is 100%, because of the concurrency capability of the disk , so disk usage may not necessarily reach the bottleneck)

2. Memory (free)

In the Linux system, we check the memory usage. Use the free command to view the information in the first line of

Performance anomaly locating and performance monitoring on Linux

(we can think of it from the operating system level)

total: total physical memory size

used: allocated size

free: not allocated Size

shared: the size of shared memory, mainly used for IPC communication

buffers: used for buffering of block devices

cached: used for file content buffering, that is, cache

"Cache" is to divide a block in the memory Area, as a buffer between the process and the hard disk, the process writes data into the cache. When the data needs to be read, it will be read directly from the "highway" cache instead of the "dirt road" hard disk. Reading, which greatly speeds up performance

The buffer here actually stores the metadata of our data (including directory name, file size, file storage block, modification time, permissions, etc.), while the cache stores our recently read data. Retrieved documents.

The third line of information (we can think of it from the application level)

The -/+ buffers/cache here are -buffers/cache and +buffers/cache respectively

-buffers/cache = used (No. One line) -buffers-cached is actually the "physical memory" "actually used" by the current program

+buffers/cache = buffers+cached It means the size of memory temporarily "lent" to the system for use as a "buffer"

used=(+buffers/cached)+(-buffers/cached)

So from the application level, available memory = free memory+buffers+cached

We can view the detailed information in the following way.

~ cat /proc/meminfo

MemTotal: 1020128 kB

MemFree: 670772 kB

Buffers: 97780 kB

Cached: 100980 kB

SwapCached: 0 kB

Active: 164988 kB

Inactive: 117296 kB

Active(anon): 83536 kB

Inactive(anon): 160 kB

Active(file): 81452 kB

Inactive(file): 117136 kB

Unevictable: 0 kB

Mlocked: 0 kB

SwapTotal: 0 kB

SwapFree: 0 kB

Dirty: 92 kB

Writeback: 0 kB

AnonPages: 83504 kB

Mapped: 17500 kB

Shmem: 172 kB

Slab: 46696 kB

SReclaimable: 28652 kB

SUnreclaim: 18044 kB

KernelStack: 1744 kB

PageTables: 2636 kB

NFS_Unstable: 0 kB

Bounce: 0 kB

WritebackTmp: 0 kB

CommitLimit: 510064 kB

Committed_AS: 343800 kB

VmallocTotal: 34359738367 kB

VmallocUsed: 7112 kB

VmallocChunk: 34359727304 kB

HardwareCorrupted: 0 kB

AnonHugePages: 36864 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

DirectMap4k: 8184 kB

DirectMap2M: 1040384 kB

三,CPU(dstat,mpstat)

首先我们使用dstat命令来查看下我们的CPU情况，他能够实时的输出我们的信息,

Performance anomaly locating and performance monitoring on Linux

每2秒输出一次，一共输出10次

cpu：hiq、siq分别为硬中断和软中断次数

system：int、csw分别为系统的中断次数（interrupt）和上下文切换次数（context switch）。

-c：表示只显示我们的CPU信息

-m：表示只显示我们的内存信息

-p：表示只显示我们的进程信息

-n：表示只显示我们的网络信息

我们想以什么为什么优先顺序查看，可以在后面加下列参数

Performance anomaly locating and performance monitoring on Linux

mpstat

Performance anomaly locating and performance monitoring on Linux

%user 在internal时间段里，用户态的CPU时间(%)，不包含nice值为负进程 (usr/total)*100
%nice 在internal时间段里，nice值为负进程的CPU时间(%) (nice/total)*100
%sys 在internal时间段里，内核时间(%) (system/total)*100
%iowait 在internal时间段里，硬盘IO等待时间(%) (iowait/total)*100
%irq 在internal时间段里，硬中断时间(%) (irq/total)*100
%soft 在internal时间段里，软中断时间(%) (softirq/total)*100
%idle 在internal时间段里，CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间(%) (idle/total)*100

四,TCP连接数(ss,netstat)

ss是Socket Statistics的缩写，顾名思义ss命令就是用来获取sockets的信息，他可以显示和netstat类似的内容，但是他比netstat更快更高效，而且显示更为详细的有关TCP连接信息。当我们的sockets连接数非常大的时候，无论是我们使用netstat命令还是在内核中查看连接数cat /proc/net/tcp的时候都会很缓慢。

The reason why ss is fast is that it uses tcp_diag in the TCP protocol. tcp_diag is a module used for analysis and statistics. It can obtain first-hand information in the Linux kernel, which ensures the efficiency of ss.

We can make a comparison between netstat and ss. There are pictures and the truth

Performance anomaly locating and performance monitoring on Linux

The time of the netstat command is obviously much slower than the time of the ss command

The netstat command

Performance anomaly locating and performance monitoring on Linux

We can see Connection status information to the daemon process in the system and the monitored port number

-t: Indicates TCP connection

-u: Indicates UDP connection

-n: Indicates displaying information in the form of numbers

-p: Indicates displaying the listening port number

View the monitoring status of the daemon process in the system

Performance anomaly locating and performance monitoring on Linux

We can see the State status display

ss command

View the network connection statistics of the current server: ss - s

Performance anomaly locating and performance monitoring on Linux

The usage of other ss is the same as that of netstat

5. Network (iftop)

Use iftop -i eth0

Performance anomaly locating and performance monitoring on Linux

Use Ctrl+c to exit, exit display

Performance anomaly locating and performance monitoring on Linux

We can use the -i parameter to monitor different network card traffic information. In which interface of iftop we can press p to view the port traffic information

Performance anomaly locating and performance monitoring on Linux

6. Process information (ps/pstree, top, pidstat)

We use pstree to view our process tree, all processes are child processes of the init process

Performance anomaly locating and performance monitoring on Linux

ps command

to view specific processes, such as the MySQL process we can use ps aux mysqld or ps -elf mysqld , there is essentially no difference between the two, because Linux inherits some ideas from Unix, one is the Sys-v style of Unix, and the other is the BSD style

Performance anomaly locating and performance monitoring on Linux