Some practices of mysql stand-alone database optimization_MySQL
There is a lot to talk about in database optimization. According to the amount of data supported, it can be divided into two stages: stand-alone database and sub-database and sub-table. The former can generally support data within 500W or 10G. If it exceeds this value, you need to consider sub-database and sub-table. surface. In addition, interviews with large companies often start with questions about a single-machine database, and then ask about sub-databases and tables step by step, with many database optimization questions interspersed in the middle. This article attempts to describe some practices of stand-alone database optimization. The database is based on mysql. If there is anything unreasonable, please correct me.
1. Table structure optimization
When starting to build an application, the table structure design of the database will often affect the performance of the application later, especially the performance after the number of users increases. Therefore, table structure optimization is a very important step.
1.1, character set
Generally speaking, try to choose UTF-8. Although GBK uses less storage space than UTF-8 when storing data, UTF-8 is compatible with various languages. In fact, we do not have to sacrifice scalability for this storage space. In fact, if you want to convert from GBK to UTF-8 later, the cost will be very high, and data migration will be required, and the storage space can be solved by spending money to expand the hard disk.
1.2, primary key
When using mysql's innodb, the underlying storage model of innodb is a B+ tree, which uses the primary key as a clustered index and the inserted data as leaf nodes. Leaf nodes can be quickly found through the primary key, thereby quickly obtaining records. Therefore, you need to add a primary key when designing the table, and it is best to increase it automatically. Because the auto-increasing primary key allows the inserted data to be inserted into the leaf nodes of the underlying B+ tree in the order of the primary key. Since it is in order, this kind of insertion almost does not need to move other existing data, so the insertion efficiency is very high. If the primary key is not auto-increasing, then the value of the primary key will be approximately random each time. At this time, a large amount of data may need to be moved to ensure the characteristics of the B+ tree, adding unnecessary overhead.
1.3, field
1.3.1. The indexed fields must be added with the not null constraint and set the default value
1.3.2. It is not recommended to use float or double to store decimals to prevent loss of precision. It is recommended to use decimal
1.3.3. It is not recommended to use Text/blob to save large amounts of data, because reading and writing large text will cause relatively large I/O overhead and occupy the mysql cache. High concurrency will greatly reduce the throughput of the database. It is recommended to save large text data in a special file storage system. MySQL only saves the access address of this file. For example, blog posts can be saved in files. MySQL only saves the relative address of the file.
1.3.4. It is recommended that the length of varchar type should not exceed 8K.
1.3.5. It is recommended to use Datetime instead of timestamp for time type. Although Datetime occupies 8 bytes and timestamp only occupies 4 bytes, the latter must be non-empty and is sensitive to time zones.
1.3.6. It is recommended to add two fields, gmt_create and gmt_modified, to the table to record the modification time of data creation. The reason for establishing these two fields is to facilitate troubleshooting.
1.4, index creation
1.4.1. At this stage, because you don’t understand the business, try not to blindly add indexes, and only add ordinary indexes to some fields that will definitely be used.
1.4.2. The length of creating an innodb single column index should not exceed 767 bytes. If it exceeds, the first 255 bytes will be used as the prefix index
1.4.3. The length of each column index when creating an innodb combined index should not exceed 767 bytes, and the total should not exceed 3072 bytes
2. SQL optimization
Generally speaking, there are only a few types of SQL: basic add, delete, modify, query, paging query, range query, fuzzy search, multi-table connection
2.1, basic query
General queries need to be indexed. If there is no index, it is recommended to modify the query and add the field with the index. If this field cannot be used due to the business scenario, then you need to check whether the call volume of this query is large. If it is large, for example, it is called every day. 10W+, this requires adding a new index. If it is not large, such as 100+ calls per day, you can consider leaving it as is. In addition, use select * as little as possible. Just add whatever fields are used in the sql statement. Don't check unnecessary fields, which wastes I/O and memory space.
2.2, efficient paging
The essence of limit m,n is to first execute limit m+n, and then take n rows from the mth row. In this way, when the limit page is turned backward, m will become larger and the performance will be lower. For example
select * from A limit 100000,10. The performance of this SQL statement is very poor. It is recommended to change it to the following version:
selec id,name,age from A where id >=(select id from A limit 100000,1) limit 10
2.3, range query
Range queries include between, greater than, less than and in. The conditions of the in query in Mysql are limited in number. If the number is small, index query can be used. If the number is large, it becomes a full table scan. As for between, greater than, less than, etc., these queries will not be indexed, so try to put them after the query conditions that are indexed.
2.4, fuzzy query like
Using statements like %name% will not go through the index, which is equivalent to a full table scan. When the amount of data is small, there will not be much of a problem. When the amount of data is large, the performance will drop drastically. It is recommended to use a large amount of data. In the future, search engines will be used to replace this fuzzy search. If this is not possible, a condition for indexing should be added before the fuzzy query.
2.5, multi-table connection
Both subquery and join can be used to fetch data from multiple tables, but the performance of subquery is poor. It is recommended to change the subquery to join. For mysql's join, it uses the Nested Loop Join algorithm, which means querying the next table through the result set of the previous table query. For example, the result set of the previous table is 100 pieces of data, and the latter table has 10W data. Then you need to filter the 100*10W data set to get the final result set. Therefore, try to use a table with a small result set to join a large table, and at the same time create an index on the join field. If the index cannot be built, you need to set a large enough join buffer size. If none of the above techniques can solve the performance degradation problem caused by join, then simply stop using join and split a join query into two simple queries. In addition, multi-table connections should not exceed three tables. Generally speaking, the performance of more than three tables will be very poor. It is recommended to split the SQL.
3. Database connection pool optimization
The database connection pool is essentially a cache, which is a means of resisting high concurrency. Database connection pool optimization is mainly about optimizing parameters. Generally, we use DBCP connection pool, and its specific parameters are as follows:
3.1 initialSize
The number of initial connections. The initial here refers to the first time getConnection is obtained, not when the application starts. The initial value can be set to the historical average of concurrency
3.2, minIdle
Minimum number of reserved idle connections. DBCP will start a thread in the background to recycle idle connections. When the thread recycles idle connections, it will retain the number of minIdle connections. Generally set to 5, if the amount of concurrency is really small, it can be set to 1.
3.3, maxIdle
The maximum number of reserved idle connections is set according to business concurrency peaks. For example, if the concurrency peak is 20, then when the peak passes, these connections will not be recycled immediately. If there is another peak after a short period of time, the connection pool can reuse these idle connections without the need to frequently create and close connections.
3.4, maxActive
The maximum number of active connections is set according to the acceptable concurrency extreme value. For example, the acceptable extreme value of single-machine concurrency is 100. Then after maxActive is set to 100, it can only serve 100 requests at the same time, and excess requests will be abandoned after the maximum waiting time. This value must be set to prevent malicious concurrency attacks and protect the database.
3.5, maxWait
The maximum waiting time for obtaining a connection is recommended to be set shorter, such as 3s, so that the request can fail quickly, because when a request is waiting to obtain a connection, the thread cannot be released, and the thread concurrency of a single machine is limited. Yes, if this time is set too long, such as the 60s recommended on the Internet, then this thread cannot be released within these 60s. As long as there are more such requests, the application will have fewer available threads and the service will become unavailable. .
3.6, minEvictableIdleTimeMillis
The time the connection remains idle without being recycled, the default is 30 minutes.
3.7, validationQuery
The sql statement used to check whether the connection is valid, usually a simple sql, it is recommended to set it
3.8, testOnBorrow
The connection is detected when applying for a connection. It is not recommended to turn it on as it will seriously affect the performance
3.9, testOnReturn
Check the connection when returning it. It is not recommended to turn it on as it will seriously affect the performance
3.10, testWhileIdle
After turning it on, the thread that cleans up the connection in the background will validate the idle connection every once in a while. If the connection fails, it will be cleared. It will not affect the performance. It is recommended to turn it on
3.11, numTestsPerEvictionRun
Represents the number of links checked each time. It is recommended to set it as large as maxActive so that all links can be effectively checked each time.
3.12. Preheat connection pool
For the connection pool, it is recommended to preheat the application when starting it, and perform a simple SQL query before providing external access to fill the connection pool with the necessary number of connections.
4. Index optimization
When the amount of data increases to a certain level, performance can no longer be improved by sql optimization. At this time, a big move is needed: indexing. There are three levels of indexing. Generally speaking, it is enough to master these three levels. In addition, the selectivity of the fields to be indexed needs to be considered.
4.1, primary index
Create an index on the condition behind where. A single column can create a normal index, and multiple columns can create a combined index. Composite indexes need to pay attention to the leftmost prefix principle.
4.2, secondary index
If there is a field used by order by or group by, you can consider building an index on this field. In this way, since the index is naturally ordered, the sorting caused by order by and group by can be avoided, thereby improving performance. .
4.3, three-level index
If the above two methods are not enough, then add the index to the queried field. At this time, the so-called index coverage is formed. This can reduce one I/O operation, because when MySQL queries the data, it first Check the primary key index, then check the ordinary index based on the primary key index, and then check the corresponding records based on the ordinary index. If the records we need are in the ordinary index, then the third step is not needed. Of course, this index building method is quite extreme and is not suitable for general scenarios.
4.4. Index selectivity
When building an index, try to build it on highly selective fields. What does high selectivity mean? The so-called high selectivity means that the amount of data found through this field is small. For example, if you check a person's information by name, the amount of data found will generally be very small, but if you check by gender, half of the data in the database may be found. Therefore, name is a highly selective field, and gender is a low-selective field.
5. Historical data archiving
When the amount of data increases by 5 million items a year, the index cannot do anything. At this time, the general idea is to consider sub-databases and tables. If the business does not grow explosively, but the data does increase slowly, you can ignore the complex technical means of sharding databases and sharding tables, and instead archive historical data. We archive historical data whose life cycle has ended, such as data from 6 months ago. We can use quartz's scheduling task to regularly check out the data 6 months ago in the early morning, and then store it in the remote hbase server. Of course, we also need to provide a query interface for historical data in case of emergency.
The above is the optimization data of mysql stand-alone database. We will continue to add relevant data in the future. Thank you for your support of this site!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Big data structure processing skills: Chunking: Break down the data set and process it in chunks to reduce memory consumption. Generator: Generate data items one by one without loading the entire data set, suitable for unlimited data sets. Streaming: Read files or query results line by line, suitable for large files or remote data. External storage: For very large data sets, store the data in a database or NoSQL.

Backing up and restoring a MySQL database in PHP can be achieved by following these steps: Back up the database: Use the mysqldump command to dump the database into a SQL file. Restore database: Use the mysql command to restore the database from SQL files.

MySQL query performance can be optimized by building indexes that reduce lookup time from linear complexity to logarithmic complexity. Use PreparedStatements to prevent SQL injection and improve query performance. Limit query results and reduce the amount of data processed by the server. Optimize join queries, including using appropriate join types, creating indexes, and considering using subqueries. Analyze queries to identify bottlenecks; use caching to reduce database load; optimize PHP code to minimize overhead.

How to insert data into MySQL table? Connect to the database: Use mysqli to establish a connection to the database. Prepare the SQL query: Write an INSERT statement to specify the columns and values to be inserted. Execute query: Use the query() method to execute the insertion query. If successful, a confirmation message will be output.

Creating a MySQL table using PHP requires the following steps: Connect to the database. Create the database if it does not exist. Select a database. Create table. Execute the query. Close the connection.

To use MySQL stored procedures in PHP: Use PDO or the MySQLi extension to connect to a MySQL database. Prepare the statement to call the stored procedure. Execute the stored procedure. Process the result set (if the stored procedure returns results). Close the database connection.

One of the major changes introduced in MySQL 8.4 (the latest LTS release as of 2024) is that the "MySQL Native Password" plugin is no longer enabled by default. Further, MySQL 9.0 removes this plugin completely. This change affects PHP and other app

Oracle database and MySQL are both databases based on the relational model, but Oracle is superior in terms of compatibility, scalability, data types and security; while MySQL focuses on speed and flexibility and is more suitable for small to medium-sized data sets. . ① Oracle provides a wide range of data types, ② provides advanced security features, ③ is suitable for enterprise-level applications; ① MySQL supports NoSQL data types, ② has fewer security measures, and ③ is suitable for small to medium-sized applications.
