Home > Backend Development > Python Tutorial > How to Remove Non-Printable Characters from Strings in Python?

How to Remove Non-Printable Characters from Strings in Python?

Patricia Arquette
Release: 2024-10-22 06:57:02
Original
401 people have browsed it

How to Remove Non-Printable Characters from Strings in Python?

Stripping Non-Printable Characters from a String in Python

In contrast to Perl, Python lacks POSIX regex classes, making it challenging to detect and remove non-printable characters using regular expressions.

So, how can you achieve this in Python?

One approach is to leverage the unicodedata module. The unicodedata.category function classifies Unicode characters into various categories. For instance, characters categorized as Cc (control) represent non-printable characters.

Using this knowledge, you can construct a custom character class that matches all control characters:

<code class="python">import unicodedata
import re
import sys

all_chars = (chr(i) for i in range(sys.maxunicode))
categories = {'Cc'}
control_chars = ''.join(c for c in all_chars if unicodedata.category(c) in categories)

control_char_re = re.compile('[%s]' % re.escape(control_chars))

def remove_control_chars(s):
    return control_char_re.sub('', s)</code>
Copy after login

This function effectively strips all non-printable ASCII characters from the input string.

Alternatively, you can use Python's built-in string.printable method to filter out non-printable characters. However, this method excludes Unicode characters, so it may not suit all use cases.

To handle Unicode characters, you can expand the character class in the regular expression as follows:

<code class="python">control_chars = ''.join(map(chr, itertools.chain(range(0x00,0x20), range(0x7f,0xa0))))</code>
Copy after login

This extended character class encompasses the basic control characters along with common non-printable Unicode characters.

By modifying the remove_control_chars function accordingly, you can successfully handle both ASCII and Unicode non-printable characters.

The above is the detailed content of How to Remove Non-Printable Characters from Strings in Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template