Home System Tutorial LINUX The Ultimate Guide - How to Write Better SQL Queries?

The Ultimate Guide - How to Write Better SQL Queries?

Jan 12, 2024 pm 12:15 PM
linux linux tutorial Red Hat linux system linux command linux certification red hat linux linux video

Queries based on collection and program methods

Implicit in the reverse model is the fact that there is a difference between set-based and program-based approaches to query building.

  • The procedural approach to querying is one very similar to programming: you tell the system what needs to be done and how to do it. For example, as in the example in the previous article, query the database by executing one function and then calling another function, or use a logical approach involving loops, conditions, and user-defined functions (UDFs) to obtain the final query results. You will find that in this way, you are always requesting a subset of the data in each layer. This approach is also often referred to as step-by-step or row-by-row querying.
  • The other is a collection-based method, where you only need to specify the operations that need to be performed. What you have to do with this method is specify the conditions and requirements for the results you want to obtain through the query. When retrieving data, you don't need to pay attention to the internal mechanisms that implement the query: the database engine determines the best algorithm and logic to execute the query.

Since SQL is set-based, this approach is more efficient than the procedural approach, which explains why in some cases, SQL can work faster than code.

Set-based query methods are also skills that the data mining analysis industry requires you to master! Because you need to be skilled in switching between these two methods. If you find that you have procedural queries in your queries, you should consider whether this part needs to be rewritten.

The Ultimate Guide - How to Write Better SQL Queries?

From query to execution plan

Reverse mode is not static. As you progress towards becoming a SQL developer, avoiding query reverse models and rewriting queries can be a daunting task. So you often need to use tools to optimize your queries in a more structured way.

Thinking about performance requires not only a more structured approach, but also a deeper approach.

However, this structured and in-depth approach is primarily based on query plans. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how the operations are coordinated.

Query Optimization

When optimizing a query, you will most likely need to manually inspect the plan generated by the optimizer. In this case, you will need to analyze your query again by looking at the query plan.

To master such a query plan, you need to use some tools provided by the database management system. You can use some of the following tools:

  • Some software package functionality tools can generate graphical representations of query plans.
  • Other tools can provide you with a text description of the query plan.

Note that if you are using PostgreSQL, you can differentiate between different EXPLAINs, you just get a description of how the planner executes the query without running the plan. At the same time, EXPLAIN ANALYZE will execute the query and return you an analysis report that evaluates the query plan and the actual query plan. Generally speaking, the actual execution plan will actually execute the plan, while the evaluated execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains additional details and statistics about what actually happened when the query was executed.

Next you will learn more about XPLAIN and ANALYZE, and how to use these two commands to further understand your query plans and query performance. To do this, you need to start doing some examples using two tables: one_million and half_million.

You can use EXPLAIN to retrieve the current information of the one_million table: make sure you put it in the first place when running the query, and after the run is completed, it will be returned to the query plan:

EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_
Seq Scan on one_million
(cost=0.00..18584.82 rows=1025082 width=36)
(1 row)
Copy after login

In the above example, we see that the cost of the query is 0.00..18584.82, the number of rows is 1025082, and the column width is 36.

At the same time, you can also use ANALYZE to update statistical information.

ANALYZE one_million;
EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_
Seq Scan on one_million
(cost=0.00..18334.00 rows=1000000 width=37)
(1 row)
Copy after login

In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:

EXPLAIN ANALYZE
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_
Seq Scan on one_million
(cost=0.00..18334.00 rows=1000000 width=37)
(actual time=0.015..1207.019 rows=1000000 loops=1)
Total runtime: 2320.146 ms
(2 rows)
Copy after login

The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!

All the algorithms we have seen so far are sequential scans or full table scans: this is a method of performing a scan on a database in which each row of the table is scanned in sequential (serial) order When reading, each column is checked to see if it meets the criteria. In terms of performance, a sequential scan is not the best execution plan because the entire table needs to be scanned. But if you use a slow disk, sequential reads will also be fast.

There are also some examples of other algorithms:

EXPLAIN ANALYZE
SELECT *
FROM one<span class="hljs-emphasis">_million JOIN half_</span>million
ON (one<span class="hljs-emphasis">_million.counter=half_</span>million.counter);
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_
Hash Join (cost=15417.00..68831.00 rows=500000 width=42)
(actual time=1241.471..5912.553 rows=500000 loops=1)
Hash Cond: (one<span class="hljs-emphasis">_million.counter = half_</span>million.counter)
<span class="hljs-code">    -> Seq Scan on one_million</span>
<span class="hljs-code">    (cost=0.00..18334.00 rows=1000000 width=37)</span>
<span class="hljs-code">    (actual time=0.007..1254.027 rows=1000000 loops=1)</span>
<span class="hljs-code">    -> Hash (cost=7213.00..7213.00 rows=500000 width=5)</span>
<span class="hljs-code">    (actual time=1241.251..1241.251 rows=500000 loops=1)</span>
<span class="hljs-code">    Buckets: 4096 Batches: 16 Memory Usage: 770kB</span>
<span class="hljs-code">    -> Seq Scan on half_million</span>
<span class="hljs-code">    (cost=0.00..7213.00 rows=500000 width=5)</span>
(actual time=0.008..601.128 rows=500000 loops=1)
Total runtime: 6468.337 ms
Copy after login

We can see that the query optimizer selected Hash Join. Remember this operation because we need to use this to evaluate the time complexity of the query. We noticed that there is no half_million.counter index in the above example, we can add the index in the following example:

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> <span class="hljs-keyword">ON</span> half_million(counter);
<span class="hljs-keyword">EXPLAIN</span> <span class="hljs-keyword">ANALYZE</span>
<span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> one_million <span class="hljs-keyword">JOIN</span> half_million
<span class="hljs-keyword">ON</span> (one_million.counter=half_million.counter);
QUERY PLAN
______________________________________________________________
<span class="hljs-keyword">Merge</span> <span class="hljs-keyword">Join</span> (<span class="hljs-keyword">cost</span>=<span class="hljs-number">4.12</span>.<span class="hljs-number">.37650</span><span class="hljs-number">.65</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">42</span>)
(actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.033</span>.<span class="hljs-number">.3272</span><span class="hljs-number">.940</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>)
<span class="hljs-keyword">Merge</span> Cond: (one_million.counter = half_million.counter)
    -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> one_million_counter_idx <span class="hljs-keyword">on</span> one_million
    (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.32129</span><span class="hljs-number">.34</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">1000000</span> width=<span class="hljs-number">37</span>)
    (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.011</span>.<span class="hljs-number">.694</span><span class="hljs-number">.466</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500001</span> loops=<span class="hljs-number">1</span>)
    -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> half_million_counter_idx <span class="hljs-keyword">on</span> half_million
    (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.14120</span><span class="hljs-number">.29</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">5</span>)
(actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.010</span>.<span class="hljs-number">.683</span><span class="hljs-number">.674</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>)
Total runtime: <span class="hljs-number">3833.310</span> ms
(<span class="hljs-number">5</span> <span class="hljs-keyword">rows</span>)
Copy after login

By creating the index, the query optimizer has decided how to find the Merge join when the index is scanned.

Please note the difference between index scan and full table scan (sequential scan): the latter (also called "table scan") finds suitable results by scanning all data or indexing all pages, while the former Scan only every row in the table.

The second part of the tutorial is introduced here. The final article in the series "How to Write Better SQL Queries" will follow, so stay tuned.

Please indicate the source of reprinting: Grape City Control

The above is the detailed content of The Ultimate Guide - How to Write Better SQL Queries?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What computer configuration is required for vscode What computer configuration is required for vscode Apr 15, 2025 pm 09:48 PM

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

vscode cannot install extension vscode cannot install extension Apr 15, 2025 pm 07:18 PM

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

Can vscode be used for mac Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

How to use VSCode How to use VSCode Apr 15, 2025 pm 11:21 PM

Visual Studio Code (VSCode) is a cross-platform, open source and free code editor developed by Microsoft. It is known for its lightweight, scalability and support for a wide range of programming languages. To install VSCode, please visit the official website to download and run the installer. When using VSCode, you can create new projects, edit code, debug code, navigate projects, expand VSCode, and manage settings. VSCode is available for Windows, macOS, and Linux, supports multiple programming languages ​​and provides various extensions through Marketplace. Its advantages include lightweight, scalability, extensive language support, rich features and version

What is the main purpose of Linux? What is the main purpose of Linux? Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

How to check the warehouse address of git How to check the warehouse address of git Apr 17, 2025 pm 01:54 PM

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

See all articles