Home > Backend Development > C++ > How Can I Compare Strings in C# While Ignoring Accented Characters?

How Can I Compare Strings in C# While Ignoring Accented Characters?

Linda Hamilton
Release: 2025-01-24 16:07:10
Original
493 people have browsed it

How Can I Compare Strings in C# While Ignoring Accented Characters?

Ignore accented characters when comparing strings in C#

Handling string comparisons with accented characters can be tricky in C#. Consider the following example:

<code class="language-csharp">string s1 = "hello";
string s2 = "héllo";

s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);</code>
Copy after login

The two strings should be equal, but both statements return false. This is because the accents on letters are treated as different characters. To solve this problem, we can use a technique that removes additional symbols (or accents) before comparing the strings.

Remove additional symbols

Here's a way to remove additional symbols from a string:

<code class="language-csharp">static string RemoveDiacritics(string text)
{
    // 将字符串规范化为 Unicode 规范化形式 D
    string formD = text.Normalize(NormalizationForm.FormD);

    // 创建一个 StringBuilder 来保存结果字符串
    StringBuilder sb = new StringBuilder();

    // 迭代规范化字符串中的字符
    foreach (char ch in formD)
    {
        // 检查字符是否不是非间隔标记
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            // 将字符附加到 StringBuilder
            sb.Append(ch);
        }
    }

    // 将 StringBuilder 转换为字符串并将其规范化为 Unicode 规范化形式 C
    return sb.ToString().Normalize(NormalizationForm.FormC);
}</code>
Copy after login

This method first normalizes the string into D form, which splits accented characters into multiple characters. It then iterates over these characters and filters out non-spacing tokens (i.e. accents). Finally, it recombines the remaining characters to form an unaccented string.

To use this method, simply apply it to two strings before comparing them. For example:

<code class="language-csharp">string s1 = "hello";
string s2 = "héllo";

s1.Equals(RemoveDiacritics(s2), StringComparison.InvariantCultureIgnoreCase);  // True</code>
Copy after login

This will correctly evaluate to true, considering the accented "e" in "héllo" to be equivalent to the unaccented "e" in "hello".

The above is the detailed content of How Can I Compare Strings in C# While Ignoring Accented Characters?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template