Let me sigh: DBA is really not covered
If the system occasionally pauses, slows queries, or causes shadowing problems, try not to use trial and error methods to solve problems: the risk is high
Higher frequency: 1s/time execution of this command to obtain data, the problem occurs through the counter
Enable slow query, set global long_query_time=0, and confirm that all connections adopt the new settings (you may need to reset all connections to take effect)
Pay attention to the logs during the period when the throughput suddenly drops. The query is only written to the slow query log during the completion phase
Good tools get twice the result with half the effort: tcpdump, pt-query-digest, Percona Server
Visualize data: gnuplot /R (drawing tool)
Installation Some commands: Common skills Getting started tutorial 2 Gnuplot Data visualization
Recommendation: Use the first two methods first, which are low-cost and interactively collect data through simple shell scripts or repeatedly executed queries
Is intermittent Problem, try to collect as much data as possible (not just when the problem occurs)
Figure out: 1. There is a way to distinguish when the problem occurs: trigger; 2. Tools to collect diagnostic data
Error: A lot of diagnostic data is collected during the period when no problem occurs, which is a waste of time (this is not inconsistent with the previous one, read it carefully)
Missed detection: When the problem occurs No data was captured, an opportunity was missed. Before starting collection, confirm that the trigger can truly identify the problem.
Good triggers:
Find some that can work with normal Indicators for comparing thresholds
Choose an appropriate threshold: high enough (not triggered when normal), not too high (not missed when problems occur)
Recommended tool pt-stalk【 Reference】【2】Trigger, set to a certain condition to record and configure the frequency of threshold checks for variables to be monitored
Execution time: Working time and waiting time
Collect all the data that can be collected within the required time period
The reason for the unknown problem: 1 , The server needs to do a lot of work, resulting in a lot of CPU consumption; 2. Waiting for resource release
Collect diagnostic data using different methods to confirm the reason:
1. Analysis report: Confirm whether there are too many Work, tool: tcpdump monitors TCP traffic mode opening and closing, slow query log
2. Wait analysis: confirm whether there is a large number of waits, GDB stack trace information, show processlist, show innodb status to observe threads and transaction status
Purpose: 1. Whether the problem really occurred; 2. Whether there is an obvious jump change
Tools:
oprofileUsing the performance counter (performance counter) provided at the CPU hardware level, through counting sampling, it helps us find the "culprit" that occupies the CPU from the process, function, and code levels. Example [Reference]
The opreport command is a method to view the CPU usage from the process and function levels respectively
samples | %| ----------------------------------------------------- 镜像内发生的采样次数 采样次数所占总采样次数的百分比 镜像名称
The opannotate command can display the statistical information of the CPU occupied at the code level
GDB:In Linux application development, the most commonly used debugger is gdb (the object of debugging is an executable file). It can set breakpoints in the program, view variable values, and track the execution process of the program step by step. (data, source code), view memory and stack information. Using these functions of the debugger can easily find non-grammatical errors in the program. [Reference] [Reference] Syntax and examples
Intermittent performance problem, with knowledge of MySQL, innodb, GNU/Linux
Clear: 1. What is the problem, describe it clearly; 2. What actions have been taken to solve the problem?
Start: 1. Understand the behavior of the server; 2. Sort out the status parameters of the server and configure the software and hardware environment (pt-summary pt-mysql-summary)
Don’t be distracted by various situations that are too off-topic. Write the questions on a slip of paper. Check whether each crossed out
is a cause or a result? ? ?
Possible reasons why resources become inefficient:
1. Resources are overused and the balance is insufficient; 2. Resources are not correctly matched; 3. Resources Damage or failure
USER_STATISTICS: Some tables measure and audit database activities
strace: Investigate system calls, use Actual time, unpredictability, overhead, oprofileUse spent CPU cycles
The most effective way to define performance is response time
If it cannot be measured, it cannot be effectively optimized. Performance optimization work needs to be based on high-quality, comprehensive and complete response time measurement
The best starting point for measurement is an application. Even if the problem lies in the underlying database, it is easier to find the problem with good measurements
Most systems cannot measure completely, and measurements sometimes have wrong results. Find ways to bypass some limitations and be aware of the flaws and uncertainties of the method.
Complete measurements will generate a large amount of data that needs to be analyzed, so you need to use a profiler ( Best tool)
Profiling report: summarizes information, glosses over and throws away a lot of details, won’t tell you what’s missing, can’t be completely relied on
Two time-consuming operations: work or waiting. Almost the profiler can only measure the time spent on work, so waiting for sharing is sometimes a useful supplement, especially when the CPU utilization is low but the work has never been completed.
Optimization and improvement are two different things. When the cost of continued improvement exceeds the benefits, optimization should be stopped
Pay attention to your directness, ideas, and decisions as much as possible Based on data
in a words:First clarify the problem, choose the appropriate technology, make good use of tools, be careful enough, have clear logic and stick to it, don’t put the cause and effect Confused, do not make changes to the system casually before determining the problem
Related articles:
[MySQL Database] Chapter 2 Interpretation: MySQL Benchmark Test
[MySQL Database] Interpretation of Chapter 3: Server Performance Analysis (Part 1)
The above is the detailed content of [MySQL Database] Interpretation of Chapter 3: Server Performance Analysis (Part 2). For more information, please follow other related articles on the PHP Chinese website!