Master SQL DISTINCT: Deleting duplicates makes it easy
SQL DISTINCT keyword explanation: Efficiently remove duplicate lines
The DISTINCT
keyword in SQL is mainly used to filter duplicate rows in query results to ensure the uniqueness of each row of data in the returned result set.
DISTINCT
working mechanism
SELECT
queries sometimes return results containing duplicate rows. The purpose of the DISTINCT
keyword is to remove these redundant data and retain only a single row of records of unique values for each set.
grammar
<code class="sql">SELECT DISTINCT column1, column2, ... FROM table_name;</code>
Example
1. Remove duplicate values
Suppose there is a list of employees called employees
:
Employeeid | department |
---|---|
1 | hr |
2 | it |
3 | hr |
4 | Sales |
Perform the following query:
<code class="sql">SELECT DISTINCT department FROM employees;</code>
result:
department |
---|
hr |
it |
Sales |
As you can see, the duplicate "hr" department has been removed.
2. Select a unique combination
Consider another order table called orders
:
Orderid | customerid | productid |
---|---|---|
101 | 1 | a |
102 | 1 | b |
103 | 1 | a |
104 | 2 | c |
Perform the following query:
<code class="sql">SELECT DISTINCT CustomerID, ProductID FROM Orders;</code>
result:
customerid | productid |
---|---|
1 | a |
1 | b |
2 | c |
DISTINCT
removes duplicate rows according to the combination of customerid
and productid
.
Application scenarios of DISTINCT
- Get unique values : When you need to find all unique values in a column or combination of columns. For example, list all the different product categories in the database.
- Remove redundant data : In data analysis or reporting, if duplicate rows are not required. For example, get a unique department name from the employee table.
- Data Cleaning : Used to clean up data sets and remove duplicate data.
Limitations of DISTINCT
- Performance Impact :
DISTINCT
increases query execution time, especially on large datasets, as it requires scanning and comparing all rows. - Conditional deduplication cannot be achieved : If you need to remove duplicate data based on a specific condition (e.g., keeping the latest row of each unique value), you need to use other techniques, such as
ROW_NUMBER()
function.
Tips for Using DISTINCT
- Use
DISTINCT
only if necessary, as it will affect performance. - For complex deduplication operations, consider using an aggregate function (
GROUP BY
) or an analytical function as an alternative.
Summarize
The DISTINCT
keyword is a concise and powerful tool in SQL to remove duplicate rows in query results, thereby ensuring the uniqueness of the result data. When using it, the performance impact should be weighed and the appropriate technology should be selected according to actual needs.
The above is the detailed content of Master SQL DISTINCT: Deleting duplicates makes it easy. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





SUM in Oracle is used to calculate the sum of non-null values, while COUNT counts the number of non-null values of all data types, including duplicate values.

The grouping function in MySQL is used to calculate aggregate values by grouping a data set. Commonly used functions are: SUM: Calculate the sum of the values in the specified column COUNT: Calculate the number of non-NULL values in the specified column AVG: Calculate the average value of the values in the specified column MIN: Calculate the minimum value in the specified column MAX: Calculate the number of non-NULL values in the specified column the maximum value of

The SQL SUM function calculates the sum of a set of numbers by adding them together. The operation process includes: 1. Identifying the input value; 2. Looping the input value and converting it into a number; 3. Adding each number to accumulate a sum; 4. Returning the sum result.

GROUP BY is an aggregate function in SQL that is used to group data based on specified columns and perform aggregation operations. It allows users to: Group data rows based on specific column values. Apply an aggregate function (such as sum, count, average) to each group. Create meaningful summaries from large data sets, perform data aggregation and grouping.

Aggregate functions in SQL are used to calculate and return a single value for a set of rows. Common aggregation functions include: Numeric aggregation functions: COUNT(), SUM(), AVG(), MIN(), MAX() Row set aggregation functions: GROUP_CONCAT(), FIRST(), LAST() Statistical aggregation functions: STDDEV (), VARIANCE() optional aggregate functions: COUNT(DISTINCT), TOP(N)

The COUNT function in Oracle is used to count non-null values in a specified column or expression. The syntax is COUNT(DISTINCT <column_name>) or COUNT(*), which counts the number of unique values and all non-null values respectively.

MySQL's AVG() function is used to calculate the average of numeric values. It supports multiple usages, including: Calculate the average quantity of all sold products: SELECT AVG(quantity_sold) FROM sales; Calculate the average price: AVG(price); Calculate the average sales volume: AVG(quantity_sold * price). The AVG() function ignores NULL values, use IFNULL() to calculate the average of non-null values.

The SUM() function in SQL is used to calculate the sum of numeric columns. It can calculate sums based on specified columns, filters, aliases, grouping and aggregation of multiple columns, but only handles numeric values and ignores NULL values.
