Remove duplicate rows when doing self join in SQL
P粉203792468
P粉203792468 2024-02-03 22:11:59
0
1
500

I am trying to remove duplicate values ​​from rows based on a join on the same table, but cannot remove a value in a column in table1 that is mapped to the same column in table2.

surface: Note: You are not limited to only one date in the table, there can be multiple dates with multiple sid

date    | sid | comp | disc
-----------------------
23 june | 1  | az  | 20
23 june | 1  | ph  | 22
23 june | 1  | mg  | 10
23 june | 2  | mg  | 8
23 june | 3  | ph  | 15
23 june | 3  | az  | 11
------------------------

About self-joining

select t1.*, t2.comp as comp1, t2.disc as disc1
from table as t1
left join table as t2 on t1.date = t2.date and t1.sid = t2.sid and t1.comp <> t2.comp
Output from above query:

date    | sid | comp | disc | comp1 | disc1
-------------------------------------------
23 june | 1  | az  | 20     | ph    | 22
23 june | 1  | az  | 20     | mg    | 10
23 june | 1  | ph  | 22     | az    | 20
23 june | 1  | ph  | 20     | mg    | 10
23 june | 1  | mg  | 10     | mg    | 10
23 june | 2  | mg  | 10     | null  | null
23 june | 3  | ph  | 10     | az    | 11
23 june | 3  | az  | 11     | ph    | 10

Expected results: (Here I try to get a combination of comp and comp1 where for each sid and date there are only different comp values ​​mapped with mg and null in comp1 if the sid has only mg or non-mg values) NOTE: There will be no duplicate rows when comp is mapped to comp1

date    | sid | comp | disc | comp1 | disc1
-------------------------------------------
23 june | 1  | az  | 20     | mg    | 10
23 june | 1  | ph  | 20     | mg    | 10
23 june | 2  | mg  | 10     | null  | null
23 june | 3  | ph  | 10     | null  | null
23 june | 3  | az  | 11     | null  | null

P粉203792468
P粉203792468

reply all(1)
P粉384244473

View dbfiddle

WITH cte AS (
SELECT t1.*, t2.comp as comp1, t2.disc as disc1, SUM(1) OVER(PARTITION BY date,sid,comp) AS cnt
FROM `table` t1
LEFT JOIN `table` t2 ON t1.date = t2.date AND t1.sid = t2.sid AND t1.comp  t2.comp 
)

SELECT 
  date, sid, comp, disc, 
  CASE WHEN comp1  'mg' THEN NULL ELSE comp1 END AS comp1, 
  CASE WHEN comp1  'mg' THEN NULL ELSE disc1 END AS disc1
FROM cte
WHERE
 (CASE WHEN comp  'mg' OR comp1 IS NULL THEN cnt END) = 1
 OR 
 (CASE WHEN comp  'mg' AND comp1 = 'mg' THEN cnt END) >= 2
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template