Joining Comma-Separated Data Column: In-Depth Solutions
In database systems, it is often necessary to manipulate data stored in comma-separated columns. Normalizing data into multiple tables is an ideal solution, but there are cases where that may not be feasible. Here we present various methods to join comma-delimited columns efficiently.
Normalization and Table Joins
Normalizing the data into separate tables is the most efficient approach. This involves creating a new table with a row for each unique value in the comma-separated column. The tables can then be joined using a foreign key relationship.
-- T1 Table CREATE TABLE T1 ( col1 varchar(2), col2 varchar(5), constraint pk1_t1 primary key (col1) ); -- T2 Table CREATE TABLE T2 ( col1 varchar(2), col2 varchar(2), constraint pk1_t2 primary key (col1, col2), constraint fk1_col2 foreign key (col2) references t1 (col1) );
Once normalized, the data can be easily queried using a join:
SELECT t2.col1, t1.col2 FROM t2 INNER JOIN t1 ON t2.col2 = t1.col1
Custom Split Function for Non-Normalized Data
If normalization is not possible, we can create a custom split function to convert the comma-separated data into individual rows.
CREATE FUNCTION [dbo].[Split](@String varchar(MAX), @Delimiter char(1)) RETURNS @temptable TABLE (items varchar(MAX)) AS BEGIN DECLARE @idx int DECLARE @slice varchar(8000) SELECT @idx = 1 IF LEN(@String)<1 OR @String IS NULL RETURN WHILE @idx!= 0 BEGIN SET @idx = CHARINDEX(@Delimiter,@String) IF @idx!=0 SET @slice = LEFT(@String,@idx - 1) ELSE SET @slice = @String IF(LEN(@slice)>0) INSERT INTO @temptable(Items) VALUES(@slice) SET @String = RIGHT(@String,LEN(@String) - @idx) IF LEN(@String) = 0 BREAK END RETURN END;
Using this function, we can join the original table with the split data:
;WITH cte AS ( SELECT c.col1, t1.col2 FROM t1 INNER JOIN ( SELECT t2.col1, i.items col2 FROM t2 CROSS APPLY dbo.split(t2.col2, ',') i ) c ON t1.col1 = c.col2 ) SELECT DISTINCT c.col1, STUFF( (SELECT DISTINCT ', ' + c1.col2 FROM cte c1 WHERE c.col1 = c1.col1 FOR XML PATH('')), 1, 1, '') col2 FROM cte c
FOR XML PATH Direct Application
Another method involves direct application of the FOR XML PATH feature:
SELECT col1, ( SELECT ', '+t1.col2 FROM t1 WHERE ','+t2.col2+',' LIKE '%,'+CAST(t1.col1 AS VARCHAR(10))+',%' FOR XML PATH(''), TYPE ).value('SUBSTRING(TEXT()[1], 3)', 'VARCHAR(MAX)') AS col2 FROM t2;
Conclusion
The optimal solution depends on the specific scenario. Normalizing the data is the most efficient option, but if that is not feasible, using a custom split function or direct application of FOR XML PATH can provide efficient results.
The above is the detailed content of How to Efficiently Join Comma-Separated Data Columns in Databases?. For more information, please follow other related articles on the PHP Chinese website!