Unicode Double-Encoding Correction in UTF-8 Tables
Encountering anomalies like "ñ" instead of "ñ" signifies a potential double-encoding issue with UTF-8 characters. This occurs when a CSV file is erroneously loaded under the assumption that it's Latin1-encoded, resulting in multibyte characters being misidentified as single characters and subsequently encoded in UTF-8 again.
Solution
To rectify this double-encoding, a MySQL function is available:
<code class="sql">CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)</code>
This function takes characters encoded in Latin1, casts them as binary, and then converts them back to UTF-8, effectively removing the double-encoding.
Correction via UPDATE Statement
To correct the affected fields, you can use the function in an UPDATE statement:
<code class="sql">UPDATE tablename SET field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);</code>
By executing this statement, the problematic characters will be restored to their correct UTF-8 representation.
The above is the detailed content of How to Correct Unicode Double-Encoding in UTF-8 Tables?. For more information, please follow other related articles on the PHP Chinese website!