Performance tuning of SELECT statements is sometimes a very time-consuming task, and in my opinion it follows the Pareto principle. 20% of the effort will probably give you 80% of the performance improvement, and it may take you 80% of the time to get the other 20% of the performance improvement. Unless you work on Venus, where every day equals 243 days on Earth, there's a good chance that delivery deadlines leave you with insufficient time to tune your SQL queries.
Based on my years of experience writing and running SQL statements, I began developing a checklist that I refer to when trying to improve query performance. I refer to it before doing query planning and reading the documentation for the database I'm using, which can sometimes be complex. My checklist is by no means comprehensive or scientific, it's more of a conservative calculation, but I can say that following these simple steps I do get performance improvements most of the time. Checklist below.
Check indexes
Indexes should be added to all fields used in the WHERE and JOIN parts of the SQL statement. Take this 3-minute SQL performance test. Regardless of your grade, be sure to read those results with information.
Limit the size of the working data set
Check those tables used in the SELECT statement to see if you can apply a WHERE clause for filtering. A typical example is a query that performs well when there are only a few thousand rows in the table. But as the application grew, queries slowed down. The solution may be as simple as limiting the query to view data for the current month.
When your query statement has a subquery, pay attention to using filtering on the inner statement of the subquery, not on the outer statement.
Select only the fields you need
Extra fields usually increase the texture of the returned data, resulting in more data being returned to the SQL client. Also:
•When using applications with reporting and analysis capabilities, sometimes reporting performance is low because the reporting tool must aggregate the data it receives in detailed form.
•Occasionally the query may run fast enough, but your problem may be a network-related problem because large amounts of detailed data are sent over the network to the reporting server.
•When using a column-oriented DBMS, only the columns you select are read from disk. The fewer columns you include in your query, the smaller the IO overhead.
Remove unnecessary tables
The reason for removing unnecessary tables is the same as the reason for removing unnecessary fields in the query statement.
Writing SQL statements is a process that usually requires a large number of iterative processes of writing and testing SQL statements. During development, you might add tables to a query, and this might not have any impact on the data returned by the SQL code. Once the SQL is running correctly, I find that many people don't review their scripts and delete tables that have no impact or effect on the final data returned. By removing JOINS operations with unnecessary tables, you reduce a large number of processes that the database must perform. Sometimes, like removing columns, you'll find that the data you reduced comes back through the database.
Removing outer join queries
This is easier said than done, depending on how much impact changing the contents of the table has. One solution is to remove the OUTER JOINS operation by placing placeholders in the rows of both tables. Suppose you have the following tables, which define OUTER JOINS to ensure that all data is returned:
CUSTOMER_NAME | |
---|---|
John Doe | |
Mary Jane | |
Peter Pan | |
Joe Soap |
CUSTOMER_ID | SALES_PERSON |
---|---|
Newbee Smith | |
Oldie Jones | |
Another Oldie | |
Greenhorn |
CUSTOMER_NAME | |
---|---|
NO CUSTOMER | |
John Doe | |
Mary Jane | |
Peter Pan | |
2 | |
1 | |
##0 | Greenhorn |