Stripping Non-Alphanumeric Characters from Strings in Python
Stripping non-alphanumeric characters from strings involves removing characters other than letters, numbers, and underscores. While solutions exist for PHP, they may not align with Pythonic principles.
One efficient method is to define a regular expression that matches non-alphanumeric characters (W_) and substitute it with an empty string. Using the re.sub() function along with a compiled regular expression pattern:
<code class="python">import re, string pattern = re.compile('[\W_]+') # Compiled regular expression string.printable = string.printable.replace(pattern, '') # Substitute non-alphanumeric characters with empty string</code>
Other methods include using list comprehensions to filter out non-alphanumeric characters or using the str.isalnum() method within filter():
<code class="python">''.join(ch for ch in string if ch.isalnum()) filter(str.isalnum, string)</code>
Comparative benchmarking showed that the re.sub() approach with a compiled regular expression yielded the best performance:
$ python -m timeit -s \ "import re, string; pattern = re.compile('[\W_]+')" \ "pattern.sub('', string.printable)" 100000 loops, best of 3: 11.2 usec per loop
The above is the detailed content of How Can I Efficiently Remove Non-Alphanumeric Characters from Strings in Python?. For more information, please follow other related articles on the PHP Chinese website!