Home > Backend Development > Python Tutorial > How Can I Efficiently Strip HTML Tags from Strings in Python?

How Can I Efficiently Strip HTML Tags from Strings in Python?

Susan Sarandon
Release: 2024-12-28 22:26:10
Original
946 people have browsed it

How Can I Efficiently Strip HTML Tags from Strings in Python?

Stripping HTML from Strings in Python

When interacting with HTML content, it often becomes necessary to separate the meaningful text from the markup tags for further processing or analysis. Here's how to achieve this efficiently in Python.

To strip HTML tags from a string, utilize the HTMLParser from the Python standard library. This parser provides a straightforward way to parse HTML documents and extract only the desired content.

For Python 3, employ the following code:

from io import StringIO
from html.parser import HTMLParser

class TagStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs = True
        self.text = StringIO()
    def handle_data(self, data):
        self.text.write(data)
    def get_data(self):
        return self.text.getvalue()

def strip_html(html):
    stripper = TagStripper()
    stripper.feed(html)
    return stripper.get_data()
Copy after login

For Python 2, use the following code:

from HTMLParser import HTMLParser
from StringIO import StringIO

class TagStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.text = StringIO()
    def handle_data(self, data):
        self.text.write(data)
    def get_data(self):
        return self.text.getvalue()

def strip_html(html):
    stripper = TagStripper()
    stripper.feed(html)
    return stripper.get_data()
Copy after login

Now, let's illustrate its usage:

html = "<p>Hello, <em>world</em>!</p>"
stripped_text = strip_html(html)
print(stripped_text)  # Output: Hello, world!
Copy after login

The above is the detailed content of How Can I Efficiently Strip HTML Tags from Strings in Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template