Summary of issues to consolidate the foundation of MySQL-Mysql Tutorial-php.cn

This article brings you relevant knowledge about mysql. It mainly summarizes some common problems and solves them, including conventional ones, as well as index classes, principle classes and framework classes. Related content, I hope it will be helpful to everyone.

Summary of issues to consolidate the foundation of MySQL

Recommended learning: mysql video tutorial

##General articles

1. Talk about the three major paradigms of database?

First normal form: field atomicity, second normal form: unique rows and primary key columns, third normal form: each column is related to the primary key column.

In actual applications, a small number of redundant fields will be used to reduce the number of related tables and improve query efficiency.

2. Only one piece of data is queried, but the execution is very slow. What are the common reasons?

SQL statements are blocked, such as: table locks, row locks, etc., resulting in The storage engine does not execute the corresponding SQL statement
It is indeed caused by improper use of the index and no indexing
The characteristics of the data in the table. The index is used, but the number of table returns is huge

3. What are the differences in the implementation methods of count(*), count(0), and count(id)?

count(*), count(constant), count(primary key) That is, the optimizer can choose the index with the smallest scan cost to execute the query, thereby improving efficiency. Their execution processes are the same.
count (non-index column), the optimizer chooses a full table scan, which means that it can only sequentially scan the leaf nodes of the clustered index.
count (secondary index column)Only the index containing the columns we specify can be selected to execute the query, which may cause the index execution cost selected by the optimizer to be not the smallest.

4. What should I do if I accidentally delete data?

1) If the amount of data is relatively large, use physical backup xtrabackup. Regularly perform full backups of the database, and you can also perform incremental backups.

2) If the amount of data is small, use mysqldump or mysqldumper, and then use binlog to recover or set up a master-slave method to recover the data. You can recover from the following points:

DML misoperation statement: You can use flashback to first parse the binlog event and then reverse it.
DDL statement misoperation: Data can only be restored through full backup and binlog application. Once the amount of data is relatively large, the recovery time will be particularly long.
rm Deletion: Use backup across computer rooms, or preferably across cities.

5. The difference between drop, truncate and delete

The DELETE statement deletes one row from the table at a time, and at the same time The row deletion operation is saved in the log as a transaction record for rollback operation.
TRUNCATE TABLE deletes all data from the table at once and does not record individual deletion operation records in the log. Deleted rows cannot be recovered. And the deletion trigger related to the table will not be activated during the deletion process, and the execution speed is fast.
The drop statement releases all the space occupied by the table.

6. Why does MySQL large table query not burst the memory?

MySQL is "sending while reading", which means that if the client receives slowly, the MySQL server will not be able to send the results because the execution time of this transaction will be reduced. lengthen.
The server does not need to save a complete result set. The processes of getting and sending data are all operated through a next_buffer.
Memory data pages are managed in Buffer Pool (BP).
InnoDB manages Buffer Pool using an improved LRU algorithm, which is implemented using a linked list. In InnoDB implementation, the entire LRU linked list is divided into young area and old area according to the ratio of 5:3 to ensure that hot data will not be washed away when cold data is loaded in large batches.

7. How to deal with deep paging (extremely large paging)?

Optimize with id: first find the maximum ID of the last paging, and then use Query using the index on id, similar to select * from user where id>1000000 limit 100.
Use covering index optimization: When a Mysql query completely hits the index, it is called a covering index, which is very fast because the query only needs to search on the index and can then return directly without going back to the table. Get the data. So we can first find out the ID of the index, and then get the data based on the ID.
Limit the number of pages if business permits

8. How do you optimize SQL in daily development?

Add appropriate indexes: Create an index for the fields used as query conditions and order by, consider multiple query fields to establish a combined index, and pay attention to the order of the combined index fields, which will be the most commonly used The columns used as restrictive conditions are placed on the far left, in descending order. The indexes should not be too many, generally within 5.
Optimize table structure: Numeric fields are better than string types. Smaller data types are usually better. Try to use NOT NULL
Optimize query statements: analyze SQl execution plan, whether it hits the index, etc. , if the SQL is very complex, optimize the SQL structure. If the amount of table data is too large, consider sub-tables

9. What is the difference between concurrent connections and concurrent queries in MySQL?

In the result of executing show processlist, I saw thousands of connections, which refers to concurrent connections.
The statement "currently executing" is a concurrent query.
The number of concurrent connections affects memory.
Concurrent queries that are too high are detrimental to the CPU. A machine has a limited number of CPU cores and if all threads rush in, the cost of context switching will be too high.
It should be noted that after a thread enters the lock wait, the concurrent thread count is reduced by one, so threads waiting for row locks or gap locks are not included in the count range. That is to say, the thread waiting for the lock does not consume the CPU, thereby preventing the entire system from locking up.

10. How does MySQL operate internally when updating a field value to the original value?

When the same data is used, the update will not be performed.
However, different binlog formats have different log processing methods:
- 1) When based on row mode, the server layer matches the record to be updated and finds that the new value is consistent with the old value. , returns directly without updating, and does not record binlog.
- 2) When based on statement or mixed format, MySQL executes the update statement and records the update statement to binlog.

#11. What is the difference between datetime and timestamp?

The date range of datetime is 1001-9999; the time range of timestamp is 1970-2038
The datetime storage time has nothing to do with the time zone; the timestamp storage time has nothing to do with Time zone related, the displayed value also depends on the time zone
The storage space of datetime is 8 bytes; the storage space of timestamp is 4 bytes
The default value of datetime is null; the default value of timestamp field Not null, the default value is the current time (current_timestamp)

12. What are the isolation levels of transactions?

"Read Uncommitted" is the lowest level and cannot be guaranteed under any circumstances
"Read Committed" can avoid the occurrence of dirty reads
"Repeatable Read" can avoid dirty reads and non-repeatable reads
"Serializable" can avoid dirty reads, non-repeatable reads and phantom reads The occurrence
The default transaction isolation level of Mysql is "Repeatable Read"

13. There are two kill commands in MySQL

kill query thread id, indicating to terminate the statement being executed in this thread
kill connection thread id, where connection can be defaulted, indicating to disconnect the thread

Index

#1. What are the index categories?

According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.
The leaf node of the primary key index stores the entire row of data. In InnoDB, the primary key index is also called a clustered index.
The leaf node content of the non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also called secondary indexes.

2. What is the difference between clustered index and non-clustered index?

Clustered index: The clustered index is an index created with the primary key. The clustered index stores the data in the table in the leaf nodes.
Non-clustered index: The index created by non-primary key stores the primary key and index column in the leaf node. When using the non-clustered index to query the data , get the primary key on the leaf and then find the data you want to find. (The process of getting the primary key and then searching for it is called table return).
Covered index: Assuming that the columns being queried happen to be the columns corresponding to the index, and there is no need to go back to the table to look up, then this index column is called Cover index.

#3. Why does InnoDB design B-tree instead of B-Tree, Hash, binary tree, and red-black tree?

Hash index can handle the addition, deletion, modification and query of a single data row at O(1) speed, but when faced with range queries or sorting, it will lead to the results of a full table scan.
B-tree can store data in non-leaf nodes. Since all nodes may contain target data, we always have to traverse the subtree downward from the root node to find data rows that meet the conditions. This feature brings A large amount of random I/O comes, causing performance degradation.
All data rows of the B tree are stored in leaf nodes, and these leaf nodes can be connected in order through "pointers". When we traverse the data in the B tree as shown below, we can directly access multiple sub-trees. Jumping between nodes can save a lot of disk I/O time.
Binary tree: The height of the tree is uneven and cannot be self-balancing. The search efficiency is related to the data (the height of the tree), and the IO cost is high.
Red-black tree: The height of the tree increases as the amount of data increases, and the IO cost is high.

4. Let’s talk about clustered indexes and non-clustered indexes?

In InnoDB, the leaf node of the index B Tree stores the entire row of data is the primary key index, also called a clustered index, which puts the data storage and index together. When you find the index, you find the data.
The leaf nodes of the index B Tree store the value of the primary key and are non-primary key indexes, also called non-clustered indexes and secondary indexes.
The first index is generally sequential IO, and the operation of returning to the table is random IO. The more times we need to return to the table, that is, the more times we need random IO, the more we tend to use full table scans.

5. Will non-clustered indexes definitely return table queries?

Not necessarily, this involves whether all the fields required by the query statement hit the index. If all the fields hit the index, then there is no need to perform a query back to the table. An index contains (covers) the values of all fields that need to be queried, and is called a "covering index".

6. Talk about the leftmost prefix principle of MySQL?

The leftmost prefix principle is leftmost priority. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side.
MySQL will keep matching to the right until it encounters a range query (>, <, between, like), and then stops matching, such as a = 1 and b = 2 and c > 3 and d = 4 if Create an index in the order of (a, b, c, d). The index of d is not used. If you create an index of (a, b, d, c), you can use it. The order of a, b, d can be arbitrary. Adjustment.
= and in can be out of order, such as a = 1 and b = 2 and c = 3. You can create (a, b, c) indexes in any order. MySQL's query optimizer will help you optimize the index. form of identification.

7. What is index pushdown?

When the leftmost prefix principle is met, the leftmost prefix can be used to locate records in the index.
Before MySQL 5.6, you could only return tables one by one starting from the ID. Find the data row on the primary key index, and then compare the field values.
The index pushdown optimization (index condition pushdown) introduced in MySQL 5.6 can first judge the fields included in the index during the index traversal process, and directly filter out records that do not meet the conditions, reducing table returns. frequency.

8. Why does Innodb use auto-incrementing id as the primary key?

If the table uses an auto-increasing primary key, then every time a new record is inserted, the record will be added sequentially to the subsequent position of the current index node. When a page is full, it will be automatically opened A new page. If a non-auto-increasing primary key is used (such as ID number or student number, etc.), since the value of the primary key inserted each time is approximately random, each new record must be inserted somewhere in the middle of the existing index page, frequently. Moving and paging operations caused a large amount of fragmentation and resulted in an index structure that was not compact enough. Subsequently, OPTIMIZE TABLE (optimize table) had to be used to rebuild the table and optimize the filled pages.

9. How is the transaction ACID feature implemented?

"Atomicity": It is implemented using undo log. If an error occurs during transaction execution or the user performs rollback, the system returns the status of the transaction start through the undo log.
"Persistence": Use redo log to achieve this. As long as the redo log is persisted, when the system crashes, the data can be recovered through the redo log.
"Isolation": Transactions are isolated from each other through locks and MVCC.
"Consistency": Consistency is achieved through rollback, recovery, and isolation in concurrent situations.

10. What is the difference between MyISAM and InnoDB in how they implement B-tree indexes?

InnoDB storage engine: the leaf nodes of the B-tree index save the data itself;
MyISAM storage engine: the leaves of the B-tree index The physical address where the node saves data;
InnoDB, its data file itself is an index file. Compared with MyISAM, the index file and data file are separated. Its table data file itself is an index structure organized by B Tree, and the node data field of the tree is saved. The complete data record is obtained. The key of this index is the primary key of the data table. Therefore, the InnoDB table data file itself is the primary index. This is called a "clustered index" or clustered index, and the remaining indexes are used as auxiliary indexes. Auxiliary indexes The data field stores the value of the corresponding record's primary key instead of the address. This is also different from MyISAM.

11. What are the categories of indexes?

According to the content of the leaf node, the index type is divided into primary key index and non-primary key index.
The leaf node of the primary key index stores the entire row of data. In InnoDB, the primary key index is also called a clustered index.
The leaf node content of the non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also called secondary indexes.

12. What scenarios will cause index failure?

Background: The fast positioning capability provided by the B-tree comes from the orderliness of sibling nodes on the same layer. Therefore, if this orderliness is destroyed, it will most likely fail. Specifically There are the following situations:

# Use left or left fuzzy matching on the index: that is, like %xx or like %xx%. Both of these methods will cause the index to fail. The reason is that the query results may be "Chen Lin, Zhang Lin, Zhou Lin" and so on, so we don't know which index value to start comparing with, so we can only query through full table scan.
Use functions for indexes/Expression calculations for indexes: Because the index saves the original value of the index field, rather than the value calculated by the function, there is no way to use the index. .
Implicit type conversion for the index: equivalent to using a new function
OR in the WHERE clause: means two as long as Just satisfy one, so it makes no sense if only one conditional column is an index column. As long as the conditional column is not an index column, a full table scan will be performed.

Proposal

1. There is a system that is not divided into databases and tables. How to design it so that the system can dynamically switch to divided databases and tables?

Stop expansion (not recommended)
Double-write migration plan: Design the expanded table structure plan, and then implement dual-write for single database and sub-database, observe After a week of no problems, turn off the read traffic of the single database and observe it for a while. After it continues to stabilize, turn off the write traffic of the single database and smoothly switch to the sub-database and sub-table.

2. How to design a sub-database and table scheme that can dynamically expand and reduce capacity?

Principle

1. What are the steps to execute a MySQL statement?

The steps for the Server layer to execute sql in sequence are:
Client request -> Connector (verify user identity, grant permissions) -> Query cache (return directly if cache exists, If it does not exist, subsequent operations will be performed) -> Analyzer (lexical analysis and syntax analysis operations on SQL) -> Optimizer (mainly selects the optimal execution plan method for executing SQL optimization) -> Executor (execution It will first check whether the user has execution permission before using the interface provided by this engine) -> Go to the engine layer to obtain the data return (if the query cache is turned on, the query results will be cached).

2. What is the internal principle of order by sorting?

MySQL will allocate a memory (sort_buffer) for each thread for sorting. The memory size is sort_buffer_size.
If the amount of data being sorted is less than sort_buffer_size, the sorting will be completed in memory.
If the amount of sorted data is large and cannot be stored in memory, temporary files on disk will be used to assist sorting, also known as external sorting.
When using external sorting, MySQL will divide it into several separate temporary files to store the sorted data, and then merge these files into one large file.

3. MVCC implementation principle?

MVCC (Multiversion concurrency control) is a way to retain multiple versions of the same data, thereby achieving concurrency control. When querying, find the data of the corresponding version through the read view and version chain.
Function: Improve concurrency performance. For high-concurrency scenarios, MVCC is less expensive than row-level locks.
The implementation of MVCC relies on the version chain, which is implemented through three hidden fields of the table.
- 1) DB_TRX_ID: Current transaction id, the time sequence of the transaction is judged by the size of the transaction id.
- 2) DB_ROLL_PRT: The rollback pointer points to the previous version of the current row record. Through this pointer, multiple versions of the data are connected together to form an undo log version chain.
- 3) DB_ROLL_ID: primary key. If the data table does not have a primary key, InnoDB will automatically generate a primary key.

#4. What is change buffer and what is its function?

5. How does MySQL ensure that data is not lost?

As long as redolog and binlog ensure persistent disks, data can be ensured after MySQL restarts abnormally. Restore binlog writing mechanism.
redolog ensures that lost data can be redone after a system exception, and binlog archives the data to ensure that lost data can be recovered.
Write redolog before transaction execution. During transaction execution, the log is first written to the binlog cache. When the transaction is submitted, the binlog cache is written to the binlog file.

#6. Why does the size of the table file remain unchanged after deleting the table?

After the data item is deleted, InnoDB marks page A and it will be marked as reusable.
What about the delete command to delete the entire table's data? As a result, all data pages will be marked as reusable. But on disk, the file does not get smaller.
Tables that have undergone a large number of additions, deletions, and modifications may have holes. These holes also take up space, so if these holes can be removed, the purpose of shrinking the table space can be achieved.
Rebuilding the table can achieve this purpose. You can use the alter table A engine=InnoDB command to rebuild the table.

7. Comparison of the three binlog formats

# The row format binlog records the primary key id of the operation row and the true value of each field. Therefore, there will be no inconsistency in the primary and secondary operating data.
statement: the recorded source SQL statement
mixed: the first two are mixed, why do you need a file in mixed format? Because some binlogs in statement format may cause inconsistency between the primary and secondary servers, so it is necessary Use row format. But the disadvantage of the row format is that it takes up a lot of space. MySQL has taken a compromise. MySQL itself will judge whether this SQL statement may cause inconsistency between the primary and secondary servers. If possible, use the row format, otherwise use the statement format.

8. MySQL locking rules

Principle 1: The basic unit of locking is next-key lock, and next-key lock is the previous Open and then close interval.
Principle 2: Only objects accessed during the search process will be locked
Optimization 1: Equivalent query on the index, when locking the unique index, next-key lock degenerates into Row lock.
Optimization 2: When traversing the equal value query on the index to the right and the last value does not meet the equality condition, the next-key lock degenerates into a gap lock
a Bug: A range query on a unique index will access the first value that does not meet the condition.

9. What are dirty reads, non-repeatable reads, and phantom reads?

"Dirty reading": Dirty reading refers to reading uncommitted data from other transactions. Uncommitment means that the data may be rolled back, which means that it may not be saved in the end. In the database, that is, data that does not exist. Reading data that may not eventually exist is called dirty reading.
"Non-repeatable read": Non-repeatable read refers to the situation where within a transaction, the data read at the beginning is inconsistent with the same batch of data read at any time before the end of the transaction.
"Phantom reading": Phantom reading does not mean that the result sets obtained by two reads are different. The focus of phantom reading is that the data status of the result obtained by a certain select operation cannot support subsequent business operations. . To be more specific: select whether a certain record exists. If it does not exist, prepare to insert the record. However, when executing insert, it is found that the record already exists and cannot be inserted. At this time, a phantom read occurs.

10. What kind of locks does MySQL have? Wouldn’t locking like the above hinder concurrency efficiency?

From the lock In terms of categories, there are shared locks and exclusive locks.
- 1) Shared lock: Also called read lock. When the user wants to read data, a shared lock is added to the data. Multiple shared locks can be added at the same time.
- 2) Exclusive lock: Also called write lock. When the user wants to write data, add an exclusive lock to the data. Only one exclusive lock can be added, and it and other exclusive locks and shared locks repel each other.
The granularity of locks depends on the specific storage engine. InnoDB implements row-level locks, page-level locks, and table-level locks.
Their locking overhead increases from large to small, and their concurrency capabilities also increase from large to small.

Framework

1. What is the principle of Mysql master-slave replication?

Master’s update events (update, insert, delete) will be written to bin-log in order. When the Slave is connected to the Master, the Master machine will open the binlog dump thread for the Slave, and this thread will read the bin-log log.
After the Slave is connected to the Master, the Slave library has an I/O thread reads the bin-log log by requesting the binlog dump thread, and then writes it to the slave library's relay logIn the log.
Slave also has a SQL thread, which monitors whether the relay-log log content is updated in real time, parses the SQL statements in the file, and executes them in the Slave database.

2. What are the Mysql master-slave replication synchronization methods?

Asynchronous replication: Mysql master-slave synchronization The default is asynchronous replication. That is, among the above three steps, only the first step is synchronous (that is, Mater writes the bin log log), that is, the master library can successfully return to the client after writing the binlog log, without waiting for the binlog log to be transferred to the slave library.
Synchronous replication: For synchronous replication, after the Master host sends the event to the Slave host, it will trigger a wait until all Slave nodes (if there are multiple Slave) return that the data replication is successful. information to the Master.
Semi-synchronous replication: For semi-synchronous replication, after the Master host sends the event to the Slave host, a wait will be triggered, until one of the Slave nodes (if (There are multiple Slaves) returns information about successful data replication to the Master.

3. What causes the Mysql master-slave synchronization delay? How to optimize it?

If the master node executes a large transaction, it will have a greater impact on the master-slave delay
Network Delay, large log, too many slaves
Multi-thread writing on the master, only single-thread synchronization on the slave node
Machine performance issues , whether the slave node uses a "bad machine"
Lock conflict problems may also cause the slave's SQL thread to execute slowly

4. What causes the Mysql master-slave synchronization delay? How to optimize it?

Large transactions: Divide large transactions into small transactions and update data in batches
Reduce the number of Slaves to no more than 5 and reduce the size of a single transaction
After Mysql 5.7, you can use multi-threaded replication and use the MGR replication architecture
In the case of problems with the disk, raid card, and scheduling strategy, a single IO delay may be very high. You can use Use the iostat command to check the IO status of the DB data disk, and then further judge
For lock problems, you can check the processlist and check the tables related to locks and transactions under information_schema.

6. What are bin log/redo log/undo log?

bin log is a file at the Mysql database level. It records all operations that modify the Mysql database. Select and show statements will not be recorded.
The data to be updated is recorded in the redo log. For example, if a piece of data has been successfully submitted, it will not be synchronized to the disk immediately. Instead, it will be recorded in the redo log first, and then wait for the right time to flush the disk. In order Implement transaction durability.
undo log is used for data recall operations. It retains the content before the record is modified. Transaction rollback can be achieved through undo log, and MVCC can be implemented by tracing back to a specific version of data based on undo log.

Recommended learning: mysql video tutorial

The above is the detailed content of Summary of issues to consolidate the foundation of MySQL. For more information, please follow other related articles on the PHP Chinese website!