#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib2
import re
import HTMLParser
class WALLSTREET:
def __init__(self, baseUrl):
self.url = baseUrl
def get_html_content(self):
url = self.url
response = urllib2.urlopen(url)
str = response.read()
print str
baseUrl="https://wallstreetcn.com/live/global" #华尔街见文url
ws = WALLSTREET(baseUrl)
ws.get_html_content()
以上是代码,写的很简单,但是print出来的是乱码
尝试了 print str.decode(“utf-8“”)
但是报错
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte
str = response.read() There are two problems with this sentence:
1. str is a built-in keyword and must be changed to another variable name
2. Check the encoding method of the web page source code. If it is utf-8 after read() Add .decode('utf-8'), if it is other, you can decode it accordingly
A small suggestion is that writing a function for this kind of small program will be more convenient than using a class, both in use and implementation
I guess you are using sublime text?
Refer to this
It should be encode, not decode, and your variable name is actually the same as the built-in keyword name
It should be encode