The Ultimate Guide - How to Write Better SQL Queries?
Implicit in the reverse model is the fact that there is a difference between set-based and program-based approaches to query building.
- The procedural approach to querying is one very similar to programming: you tell the system what needs to be done and how to do it. For example, as in the example in the previous article, query the database by executing one function and then calling another function, or use a logical approach involving loops, conditions, and user-defined functions (UDFs) to obtain the final query results. You will find that in this way, you are always requesting a subset of the data in each layer. This approach is also often referred to as step-by-step or row-by-row querying.
- The other is a collection-based method, where you only need to specify the operations that need to be performed. What you have to do with this method is specify the conditions and requirements for the results you want to obtain through the query. When retrieving data, you don't need to pay attention to the internal mechanisms that implement the query: the database engine determines the best algorithm and logic to execute the query.
Since SQL is set-based, this approach is more efficient than the procedural approach, which explains why in some cases, SQL can work faster than code.
Set-based query methods are also skills that the data mining analysis industry requires you to master! Because you need to be skilled in switching between these two methods. If you find that you have procedural queries in your queries, you should consider whether this part needs to be rewritten.
Reverse mode is not static. As you progress towards becoming a SQL developer, avoiding query reverse models and rewriting queries can be a daunting task. So you often need to use tools to optimize your queries in a more structured way.
Thinking about performance requires not only a more structured approach, but also a deeper approach.
However, this structured and in-depth approach is primarily based on query plans. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how the operations are coordinated.
Query OptimizationWhen optimizing a query, you will most likely need to manually inspect the plan generated by the optimizer. In this case, you will need to analyze your query again by looking at the query plan.
To master such a query plan, you need to use some tools provided by the database management system. You can use some of the following tools:
- Some software package functionality tools can generate graphical representations of query plans.
- Other tools can provide you with a text description of the query plan.
Note that if you are using PostgreSQL, you can differentiate between different EXPLAINs, you just get a description of how the planner executes the query without running the plan. At the same time, EXPLAIN ANALYZE will execute the query and return you an analysis report that evaluates the query plan and the actual query plan. Generally speaking, the actual execution plan will actually execute the plan, while the evaluated execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains additional details and statistics about what actually happened when the query was executed.
Next you will learn more about XPLAIN and ANALYZE, and how to use these two commands to further understand your query plans and query performance. To do this, you need to start doing some examples using two tables: one_million and half_million.
You can use EXPLAIN to retrieve the current information of the one_million table: make sure you put it in the first place when running the query, and after the run is completed, it will be returned to the query plan:
EXPLAIN SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_ Seq Scan on one_million (cost=0.00..18584.82 rows=1025082 width=36) (1 row)
In the above example, we see that the cost of the query is 0.00..18584.82, the number of rows is 1025082, and the column width is 36.
At the same time, you can also use ANALYZE to update statistical information.
ANALYZE one_million; EXPLAIN SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_ Seq Scan on one_million (cost=0.00..18334.00 rows=1000000 width=37) (1 row)
In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:
EXPLAIN ANALYZE SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_ Seq Scan on one_million (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.015..1207.019 rows=1000000 loops=1) Total runtime: 2320.146 ms (2 rows)
The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!
All the algorithms we have seen so far are sequential scans or full table scans: this is a method of performing a scan on a database in which each row of the table is scanned in sequential (serial) order When reading, each column is checked to see if it meets the criteria. In terms of performance, a sequential scan is not the best execution plan because the entire table needs to be scanned. But if you use a slow disk, sequential reads will also be fast.
There are also some examples of other algorithms:
EXPLAIN ANALYZE SELECT * FROM one<span class="hljs-emphasis">_million JOIN half_</span>million ON (one<span class="hljs-emphasis">_million.counter=half_</span>million.counter); QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_ Hash Join (cost=15417.00..68831.00 rows=500000 width=42) (actual time=1241.471..5912.553 rows=500000 loops=1) Hash Cond: (one<span class="hljs-emphasis">_million.counter = half_</span>million.counter) <span class="hljs-code"> -> Seq Scan on one_million</span> <span class="hljs-code"> (cost=0.00..18334.00 rows=1000000 width=37)</span> <span class="hljs-code"> (actual time=0.007..1254.027 rows=1000000 loops=1)</span> <span class="hljs-code"> -> Hash (cost=7213.00..7213.00 rows=500000 width=5)</span> <span class="hljs-code"> (actual time=1241.251..1241.251 rows=500000 loops=1)</span> <span class="hljs-code"> Buckets: 4096 Batches: 16 Memory Usage: 770kB</span> <span class="hljs-code"> -> Seq Scan on half_million</span> <span class="hljs-code"> (cost=0.00..7213.00 rows=500000 width=5)</span> (actual time=0.008..601.128 rows=500000 loops=1) Total runtime: 6468.337 ms
We can see that the query optimizer selected Hash Join. Remember this operation because we need to use this to evaluate the time complexity of the query. We noticed that there is no half_million.counter index in the above example, we can add the index in the following example:
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> <span class="hljs-keyword">ON</span> half_million(counter); <span class="hljs-keyword">EXPLAIN</span> <span class="hljs-keyword">ANALYZE</span> <span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> one_million <span class="hljs-keyword">JOIN</span> half_million <span class="hljs-keyword">ON</span> (one_million.counter=half_million.counter); QUERY PLAN ______________________________________________________________ <span class="hljs-keyword">Merge</span> <span class="hljs-keyword">Join</span> (<span class="hljs-keyword">cost</span>=<span class="hljs-number">4.12</span>.<span class="hljs-number">.37650</span><span class="hljs-number">.65</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">42</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.033</span>.<span class="hljs-number">.3272</span><span class="hljs-number">.940</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>) <span class="hljs-keyword">Merge</span> Cond: (one_million.counter = half_million.counter) -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> one_million_counter_idx <span class="hljs-keyword">on</span> one_million (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.32129</span><span class="hljs-number">.34</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">1000000</span> width=<span class="hljs-number">37</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.011</span>.<span class="hljs-number">.694</span><span class="hljs-number">.466</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500001</span> loops=<span class="hljs-number">1</span>) -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> half_million_counter_idx <span class="hljs-keyword">on</span> half_million (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.14120</span><span class="hljs-number">.29</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">5</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.010</span>.<span class="hljs-number">.683</span><span class="hljs-number">.674</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>) Total runtime: <span class="hljs-number">3833.310</span> ms (<span class="hljs-number">5</span> <span class="hljs-keyword">rows</span>)
By creating the index, the query optimizer has decided how to find the Merge join when the index is scanned.
Please note the difference between index scan and full table scan (sequential scan): the latter (also called "table scan") finds suitable results by scanning all data or indexing all pages, while the former Scan only every row in the table.
The second part of the tutorial is introduced here. The final article in the series "How to Write Better SQL Queries" will follow, so stay tuned.
Please indicate the source of reprinting: Grape City Control
The above is the detailed content of The Ultimate Guide - How to Write Better SQL Queries?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

Visual Studio Code (VSCode) is a cross-platform, open source and free code editor developed by Microsoft. It is known for its lightweight, scalability and support for a wide range of programming languages. To install VSCode, please visit the official website to download and run the installer. When using VSCode, you can create new projects, edit code, debug code, navigate projects, expand VSCode, and manage settings. VSCode is available for Windows, macOS, and Linux, supports multiple programming languages and provides various extensions through Marketplace. Its advantages include lightweight, scalability, extensive language support, rich features and version

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.
