I am new to python. When using the scray crawler, I encountered the special characters of html, so I searched the documentation on Baidu:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
s = '&l t;abc&g t;&nbs p;' #Leave a space to avoid web page escaping
s = html_parser.unescape(s )
Runtime prompt:
import markupbase
ImportError: No module named 'markupbase'
With the help of translation software, I looked at the official documentation of HTMLParser to find the second method
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
print(data)
return data
parser = MyHTMLParser()
s = '&l t;abc&g t;&nbs p;' #A space is left to avoid web page escaping
ss=parser.feed(s)
The second method was tested successfully. The problem encountered is that the return data sentence is invalid?
Excuse me, is there any way to solve the escape problem with just a few lines of code? If there is no second method, how can I get a return value?
Following the voice in heart.