Converting Accented Characters to Base Characters in PHP
The PHP language provides functionality to replace accented characters with their base counterparts.
Using Normalizer Class
The Normalizer class is designed for character normalization, including accents and other diacritics. To remove accents using this class:
<code class="php">use Normalizer; $string = Normalizer::normalize($string, Normalizer::FORM_D);</code>
Custom Accented Character Replacement
If you do not wish to use the Normalizer class or need to customize the accent replacement, you can use the following function:
Code:
<code class="php">function unaccent($string) { return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '', htmlentities($string, ENT_QUOTES, 'UTF-8')); }</code>
Example Usage:
To replace "ã" with "a" and "é" with "e":
<code class="php">$string = "ãé"; $unaccentedString = unaccent($string);</code>
How it Works:
This function converts the accented characters into their HTML entity representations using htmlentities(). The regular expression then replaces the HTML entity with the base character.
This method is compatible with most common accents, but it is not exhaustive and may not cover all possible cases.
The above is the detailed content of How can I convert accented characters to their base characters in PHP?. For more information, please follow other related articles on the PHP Chinese website!