從SQL 中的全名字段解析名字、中間名和姓氏
簡介
本討論圍繞著資料處理的一個常見挑戰:從單一全名中提取名字、中間名和姓氏使用SQL的字段。本文探討了旨在處理 90% 典型情況的實用解決方案。
方法
提出的方法涉及一系列嵌套的子查詢,這些子查詢分解了 fullname 欄位分成其各個部分。它假設全名的格式為“First Middle Last”,中間名是可選的。
示例
以下SQL 示例演示了該方法:
SELECT FIRST_NAME.ORIGINAL_INPUT_DATA ,FIRST_NAME.TITLE ,FIRST_NAME.FIRST_NAME ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME) THEN NULL --no more spaces? assume rest is the last name ELSE SUBSTRING( FIRST_NAME.REST_OF_NAME ,1 ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1 ) END AS MIDDLE_NAME ,SUBSTRING( FIRST_NAME.REST_OF_NAME ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME) ,LEN(FIRST_NAME.REST_OF_NAME) ) AS LAST_NAME FROM ( SELECT TITLE.TITLE ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME) THEN TITLE.REST_OF_NAME --No space? return the whole thing ELSE SUBSTRING( TITLE.REST_OF_NAME ,1 ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1 ) END AS FIRST_NAME ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME) THEN NULL --no spaces @ all? then 1st name is all we have ELSE SUBSTRING( TITLE.REST_OF_NAME ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1 ,LEN(TITLE.REST_OF_NAME) ) END AS REST_OF_NAME ,TITLE.ORIGINAL_INPUT_DATA FROM ( SELECT --if the first three characters are in this list, --then pull it as a "title". otherwise return NULL for title. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3))) ELSE NULL END AS TITLE --if you change the list, don't forget to change it here, too. --so much for the DRY prinicple... ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME)))) ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME)) END AS REST_OF_NAME ,TEST_DATA.ORIGINAL_INPUT_DATA FROM ( SELECT --trim leading & trailing spaces before trying to process --disallow extra spaces *within* the name REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),' ',' '),' ',' ') AS FULL_NAME ,FULL_NAME AS ORIGINAL_INPUT_DATA FROM ( --if you use this, then replace the following --block with your actual table SELECT 'GEORGE W BUSH' AS FULL_NAME UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME UNION SELECT 'TOMMY' AS FULL_NAME UNION SELECT 'BILLY' AS FULL_NAME UNION SELECT NULL AS FULL_NAME UNION SELECT ' ' AS FULL_NAME UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME ) RAW_DATA ) TEST_DATA ) TITLE ) FIRST_NAME
特別案例
處理特殊情況,例如缺失值、尾隨空格和超過三部分的名稱,可以提高結果的準確性。
結論
此方法為從 SQL 中的全名字段解析名字、中間名和姓氏提供了堅實的基礎,解決了典型的和特殊情況。透過使解決方案適應特定要求,您可以在名稱匹配和資料分析效率方面實現顯著提高。
以上是如何從 SQL 中的單一全名字段解析名字、中間名和姓氏?的詳細內容。更多資訊請關注PHP中文網其他相關文章!