Unicode Character Set in MySQL: Understanding the 'Ä' Anomaly
MySQL's treatment of the characters 'ä', 'ö', and 'ü' as 'AAO' when searching can be perplexing. To understand this behavior, we must delve into the world of Unicode character sets and their collations.
In non-language-specific Unicode collations, such as utf8_general_ci and utf8_unicode_ci, certain Unicode characters are treated as equivalent. Specifically, the characters 'Ä', 'Ö', and 'Ü' are normalized to their base equivalents of 'A', 'O', and 'U'. This normalization is done to ensure that characters with similar phonetic pronunciations are treated identically in comparisons and searches.
To disable this normalization and perform a case-sensitive search for the exact string 'Härligt', you can use the utf8_bin collation:
select * from topics where name='Harligt' COLLATE utf8_bin;
Alternatively, you can use a different collation for the query only, such as latin1_bin:
select * from topics where name ='Härligt' COLLATE latin1_bin;
However, if you require case-insensitive searches without the Ä = A conversion, you will need to use a different approach, as there is no MySQL collation that provides case-insensitive comparison without implicit umlaut conversion.
The above is the detailed content of Why Does \'Ä\' Become \'AAO\' in MySQL Searches?. For more information, please follow other related articles on the PHP Chinese website!