Home > Backend Development > Python Tutorial > python handles html escape characters

python handles html escape characters

高洛峰
Release: 2017-03-01 13:27:57
Original
2053 people have browsed it

The example in this article describes how Python handles HTML escape characters. Share it with everyone for your reference, the details are as follows:

When I use Python to process web page data recently, I often encounter some html escape characters (also called html character entities), such as <> etc. . Character entities are generally used to represent reserved characters in web pages. For example, > is represented by > to prevent the browser from thinking it is a tag. For details, please refer to w3school's HTML character entities. Although useful, they can greatly affect the parsing of web data. In order to handle these escape characters, there are the following solutions:

1. Use HTMLParser to process

import HTMLParser
html_cont = " asdfg>123<"
html_parser = HTMLParser.HTMLParser()
new_cont = html_parser.unescape(html_cont)
print new_cont #new_cont = " asdfg>123<"
Copy after login

convert back (It’s just that the spaces cannot be converted back):

import cgi
new_cont = cgi.escape(new_cont)
print new_cont #new_cont = " asdfg>123<"
Copy after login

2. Replace

html_cont = " asdfg>123<"
new_cont = new_cont.replace(&#39; &#39;, &#39; &#39;)
print new_cont #new_cont = " asdfg>123<"
new_cont = new_cont.replace(&#39;>&#39;, &#39;>&#39;)
print new_cont #new_cont = " asdfg>123<"
new_cont = new_cont.replace(&#39;<&#39;, &#39;<&#39;)
print new_cont #new_cont = " asdfg>123<"
Copy after login
# directly.

##I don’t know if there is a better way.

In addition, stackoverflow provides an answer to handling escape characters in xml: python - What's the best way to handle -like entities in XML documents with lxml? - Stack Overflow.


For more articles related to python processing html escape characters, please pay attention to the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template