Remove Accents/Diacritics in a String in JavaScript
Removing accented characters from strings can be a useful task for text processing and data analysis. In the provided code, the accentsTidy function attempts to remove accents using regular expressions. However, this approach may not be efficient or reliable, especially in older browsers like IE6.
ES2015/ES6 Solution
A more modern and efficient solution is to use the ES2015/ES6 String.prototype.normalize() method. This method converts a string to a Unicode normalized form. By using the "NFD" form, which decomposes combined graphemes into their base characters and combining marks, removing diacritics becomes easier. Here's an example:
const str = "Crème Brûlée"; str.normalize("NFD").replace(/[\u0300-\u036f]/g, ""); // "Creme Brulee"
The regular expression matches the Unicode range U 0300 → U 036F, which includes various diacritic marks. Other Unicode normal forms such as "NFKD" can be used to normalize characters like uFB01 (fi) differently.
Using Unicode Property Escapes
ES2018 introduced Unicode property escapes, providing a more concise way to remove diacritics:
str.normalize("NFD").replace(/\p{Diacritic}/gu, ""); // "Creme Brulee"
This escape matches all characters with the Unicode property "Diacritic".
Alternatively: Sorting
If the goal is to sort strings with accents, the Intl.Collator object can be used. It supports sorting strings based on their Unicode canonical order, which ignores diacritics. Here's an example:
const c = new Intl.Collator(); ["creme brulee", "crème brûlée", "crame brulai", "crome brouillé", "creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare); // ['crame brulai', 'creme brulay', 'creme bruléa', 'creme brulee', 'crème brûlée', 'creme brulfé', 'crome brouillé']
The above is the detailed content of How to Efficiently Remove Accents from Strings in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!