Efficiently Removing Non-ASCII Characters in C# Strings
Data cleaning often requires removing non-ASCII characters from strings. C#'s Regex.Replace
method provides a concise solution for this common task.
Example:
<code class="language-csharp">string inputString = "søme string with non-ASCII characters."; string outputString = Regex.Replace(inputString, @"[^\u0000-\u007F]+", "");</code>
Breakdown of the Regular Expression:
The regular expression [^u0000-u007F]
targets and removes all characters outside the ASCII range. Let's break it down:
[^...]
: This is a negated character class. It matches any character not within the specified set.u0000-u007F
: This defines the range of ASCII characters (Unicode code points 0 to 127).
: This quantifier ensures that one or more consecutive non-ASCII characters are matched and replaced.Replacing the matched characters with an empty string (""
) effectively removes them from the original string. This approach offers a clean and efficient way to handle non-ASCII character removal in C#.
The above is the detailed content of How to Remove Non-ASCII Characters from Strings in C#?. For more information, please follow other related articles on the PHP Chinese website!