使用 SQL 从全名字段中解析名字、中间名和姓氏
处理数据时,通常需要将名字分成各自的名字更容易操作的组成部分。在这种情况下,我们需要从“全名”字段中提取名字、中间名和姓氏,同时考虑常见的数据变化。
准确率达 90% 的高效解决方案
提供的示例提供了一个实用的解决方案,可以高度处理大多数情况准确性:
SELECT FIRST_NAME.ORIGINAL_INPUT_DATA, FIRST_NAME.TITLE, FIRST_NAME.FIRST_NAME, CASE WHEN 0 = CHARINDEX(' ', FIRST_NAME.REST_OF_NAME) THEN NULL -- No more spaces? Assume rest is last name ELSE SUBSTRING(FIRST_NAME.REST_OF_NAME, 1, CHARINDEX(' ', FIRST_NAME.REST_OF_NAME) - 1) END AS MIDDLE_NAME, SUBSTRING(FIRST_NAME.REST_OF_NAME, 1 + CHARINDEX(' ', FIRST_NAME.REST_OF_NAME), LEN(FIRST_NAME.REST_OF_NAME)) AS LAST_NAME FROM ( SELECT TITLE.TITLE, CASE WHEN 0 = CHARINDEX(' ', TITLE.REST_OF_NAME) THEN TITLE.REST_OF_NAME -- No space? Return the whole thing ELSE SUBSTRING(TITLE.REST_OF_NAME, 1, CHARINDEX(' ', TITLE.REST_OF_NAME) - 1) END AS FIRST_NAME, CASE WHEN 0 = CHARINDEX(' ', TITLE.REST_OF_NAME) THEN NULL -- No spaces at all? Then 1st name is all we have ELSE SUBSTRING(TITLE.REST_OF_NAME, CHARINDEX(' ', TITLE.REST_OF_NAME) + 1, LEN(TITLE.REST_OF_NAME)) END AS REST_OF_NAME, TITLE.ORIGINAL_INPUT_DATA FROM ( SELECT -- If the first three characters are in this list, -- then pull it as a "title". Otherwise return NULL for title. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME, 1, 3) IN ('MR ', 'MS ', 'DR ', 'MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME, 1, 3))) ELSE NULL END AS TITLE, -- If you change the list, don't forget to change it here, too. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME, 1, 3) IN ('MR ', 'MS ', 'DR ', 'MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME, 4, LEN(TEST_DATA.FULL_NAME)))) ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME)) END AS REST_OF_NAME, TEST_DATA.ORIGINAL_INPUT_DATA FROM ( SELECT -- Trim leading & trailing spaces before trying to process -- Disallow extra spaces *within* the name REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)), ' ', ' '), ' ', ' ') AS FULL_NAME, FULL_NAME AS ORIGINAL_INPUT_DATA FROM ( -- Replace this block with your actual table SELECT 'GEORGE W BUSH' AS FULL_NAME UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME UNION SELECT 'TOMMY' AS FULL_NAME UNION SELECT 'BILLY' AS FULL_NAME ) RAW_DATA ) TEST_DATA ) TITLE ) FIRST_NAME;
此查询将“MR”、“MS”、“DR”和“MRS”等前缀作为单独的“TITLE”列进行识别和删除,处理缺失的名称、多个空格姓名,以及单部分“全名”(仅名字)。
特殊处理案例
该解决方案还包括针对特定特殊情况的修改,例如空的“全名”字段、尾随/前导空格、多个连续空格以及仅包含名字的“全名” :
-- Handle the following special cases: -- 1 - The NAME field is NULL -- 2 - The NAME field contains leading / trailing spaces -- 3 - The NAME field has > 1 consecutive space within the name -- 4 - The NAME field contains ONLY the first name -- 5 - Include the original full name in the final output as a separate column, for readability -- 6 - Handle a specific list of prefixes as a separate "title" column SELECT FIRST_NAME.ORIGINAL_INPUT_DATA, FIRST_NAME.TITLE, FIRST_NAME.FIRST_NAME, CASE WHEN 0 = CHARINDEX(' ', FIRST_NAME.REST_OF_NAME) THEN NULL -- No more spaces? Assume rest is last name ELSE SUBSTRING(FIRST_NAME.REST_OF_NAME, 1, CHARINDEX(' ', FIRST_NAME.REST_OF_NAME) - 1) END AS MIDDLE_NAME, SUBSTRING(FIRST_NAME.REST_OF_NAME, 1 + CHARINDEX(' ', FIRST_NAME.REST_OF_NAME), LEN(FIRST_NAME.REST_OF_NAME)) AS LAST_NAME FROM ( SELECT TITLE.TITLE, CASE WHEN 0 = CHARINDEX(' ', TITLE.REST_OF_NAME) THEN TITLE.REST_OF_NAME -- No space? Return the whole thing ELSE SUBSTRING(TITLE.REST_OF_NAME, 1, CHARINDEX(' ', TITLE.REST_OF_NAME) - 1) END AS FIRST_NAME, CASE WHEN 0 = CHARINDEX(' ', TITLE.REST_OF_NAME) THEN NULL -- No spaces at all? Then 1st name is all we have ELSE SUBSTRING(TITLE.REST_OF_NAME, CHARINDEX(' ', TITLE.REST_OF_NAME) + 1, LEN(TITLE.REST_OF_NAME)) END AS REST_OF_NAME, TITLE.ORIGINAL_INPUT_DATA FROM ( SELECT -- If the first three characters are in this list, -- then pull it as a "title". Otherwise return NULL for title. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME, 1, 3) IN ('MR ', 'MS ', 'DR ', 'MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME, 1, 3))) ELSE NULL END AS TITLE, -- If you change the list, don't forget to change it here, too. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME, 1, 3) IN ('MR ', 'MS ', 'DR ', 'MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME, 4, LEN(TEST_DATA.FULL_NAME)))) ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME)) END AS REST_OF_NAME, TEST_DATA.ORIGINAL_INPUT_DATA FROM ( SELECT -- Trim leading & trailing spaces before trying to process -- Disallow extra spaces *within* the name REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)), ' ', ' '), ' ', ' ') AS FULL_NAME, FULL_NAME AS ORIGINAL_INPUT_DATA FROM ( -- Replace this block with your actual table SELECT 'GEORGE W BUSH' AS FULL_NAME UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME UNION SELECT 'TOMMY' AS FULL_NAME UNION SELECT 'BILLY' AS FULL_NAME UNION SELECT NULL AS FULL_NAME UNION SELECT ' ' AS FULL_NAME UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME ) RAW_DATA ) TEST_DATA ) TITLE ) FIRST_NAME;
以上是如何从 SQL 中的单个'全名”字段有效解析名字、中间名和姓氏,处理各种数据不一致和特殊情况?的详细内容。更多信息请关注PHP中文网其他相关文章!