Oracle删除重复记录的几种方式
如果把一个文件多次导入数据库,可能会引入重复记录,那么有哪些方法可以删除重复记录呢?
Oracle删除重复记录的几种方式
如果把一个文件多次导入数据库,可能会引入重复记录,那么有哪些方法可以删除重复记录呢?
REATE TABLE tbl_test(
SER_NO NUMBER,
FST_NM VARCHAR2(30),
DEPTID NUMBER,
CMNT VARCHAR2(30));
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(3, 'ccccc', 2005, 'zzz');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');
1.Using MIN(rowid) 最常用的方法,但是数据量大的话执行会很长时间
DELETE FROM tbl_test
WHERE ROWID NOT IN (SELECT MIN (ROWID)
FROM tbl_test
GROUP BY ser_no, fst_nm, deptid, cmnt);
2.Using MIN(rowid) & Join 跟第一条差不多
DELETE FROM tbl_test t
WHERE t.ROWID NOT IN (SELECT MIN (b.ROWID)
FROM tbl_test b
WHERE b.ser_no = t.ser_no
AND b.fst_nm = t.fst_nm
AND b.deptid = t.deptid
AND b.cmnt = t.cmnt);
3.Using Subquery
DELETE FROM tbl_test
WHERE ser_no IN (SELECT ser_no FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
AND fst_nm IN (SELECT fst_nm FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
AND deptid IN (SELECT deptid FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
AND cmnt IN (SELECT cmnt FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt HAVING COUNT (*) > 1)
AND ROWID NOT IN (SELECT MIN (ROWID)
FROM tbl_test
GROUP BY ser_no, fst_nm, deptid, cmnt
HAVING COUNT (*) > 1)
4. Using Nested Subqueries
DELETE FROM tbl_test a WHERE (a.ser_no, a.fst_nm, a.deptid, a.cmnt) IN (SELECT b.ser_no, b.fst_nm, b.deptid, b.cmnt
FROM tbl_test b WHERE a.ser_no = b.ser_no AND a.fst_nm = b.fst_nm AND a.deptid = b.deptid AND a.cmnt = b.cmnt AND
a.ROWID > b.ROWID);
5. Using Analytic Fucntions: 对于大表这是最有效的方法
DELETE FROM tbl_test WHERE ROWID IN (SELECT rid FROM (SELECT ROWID rid,
ROW_NUMBER () OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn FROM tbl_test)WHERE rn 1);
6. CREATE-DROP-RENAME 对资源使用比较合理,,特别对于大表。但是如果需要回滚则会产生大量undo日志信息。
CREATE TABLE tbl_test1 NOLOGGING AS SELECT tbl_test .*
FROM tbl_test tbl_test WHERE ROWID IN (SELECT rid
FROM (SELECT ROWID rid, ROW_NUMBER() OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn
FROM tbl_test) WHERE rn=1);
DROP TABLE tbl_test; --drop the original table with lots of duplicate
RENAME tbl_test1 TO tbl_test; -- your original table without duplicates.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The function in Oracle to calculate the number of days between two dates is DATEDIFF(). The specific usage is as follows: Specify the time interval unit: interval (such as day, month, year) Specify two date values: date1 and date2DATEDIFF(interval, date1, date2) Return the difference in days

The retention period of Oracle database logs depends on the log type and configuration, including: Redo logs: determined by the maximum size configured with the "LOG_ARCHIVE_DEST" parameter. Archived redo logs: Determined by the maximum size configured by the "DB_RECOVERY_FILE_DEST_SIZE" parameter. Online redo logs: not archived, lost when the database is restarted, and the retention period is consistent with the instance running time. Audit log: Configured by the "AUDIT_TRAIL" parameter, retained for 30 days by default.

The Oracle database startup sequence is: 1. Check the preconditions; 2. Start the listener; 3. Start the database instance; 4. Wait for the database to open; 5. Connect to the database; 6. Verify the database status; 7. Enable the service (if necessary ); 8. Test the connection.

The INTERVAL data type in Oracle is used to represent time intervals. The syntax is INTERVAL <precision> <unit>. You can use addition, subtraction, multiplication and division operations to operate INTERVAL, which is suitable for scenarios such as storing time data and calculating date differences.

In Oracle, you can use the nested INSTR function to determine whether a string contains two substrings at the same time: when INSTR(string1, string2a) is greater than 0 and INSTR(string1, string2b) is greater than 0, it is included; otherwise, it is not included.

To find the number of occurrences of a character in Oracle, perform the following steps: Get the total length of a string; Get the length of the substring in which a character occurs; Count the number of occurrences of a character by subtracting the substring length from the total length.

Oracle database server hardware configuration requirements: Processor: multi-core, with a main frequency of at least 2.5 GHz. For large databases, 32 cores or more are recommended. Memory: At least 8GB for small databases, 16-64GB for medium sizes, up to 512GB or more for large databases or heavy workloads. Storage: SSD or NVMe disks, RAID arrays for redundancy and performance. Network: High-speed network (10GbE or higher), dedicated network card, low-latency network. Others: Stable power supply, redundant components, compatible operating system and software, heat dissipation and cooling system.

The amount of memory required by Oracle depends on database size, activity level, and required performance level: for storing data buffers, index buffers, executing SQL statements, and managing the data dictionary cache. The exact amount is affected by database size, activity level, and required performance level. Best practices include setting the appropriate SGA size, sizing SGA components, using AMM, and monitoring memory usage.
