Fixing Double-Encoded UTF8 Characters in an UTF-8 Table
A previous import operation using LOAD DATA INFILE incorrectly assumed that the input CSV file was Latin1 encoded. This led to multibyte characters being split into two single-byte characters and subsequently double-encoded in UTF-8, creating anomalies such as 'ñ' instead of 'ñ'.
To rectify these misencoded strings, MySQL provides a solution using the CONVERT() function:
CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)
This function takes the double-encoded field and sequentially converts it from Latin1 (assuming the original file encoding) to binary representation and finally to UTF-8, effectively undoing the double encoding.
To apply this correction, an UPDATE statement can be executed:
UPDATE tablename SET field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);
This statement will replace the existing field values with the corrected ones, restoring the intended UTF-8 representation of the multibyte characters.
The above is the detailed content of How to Fix Double-Encoded UTF8 Characters in a MySQL Table?. For more information, please follow other related articles on the PHP Chinese website!