sublime-text - python爬虫编码问题
PHP中文网
PHP中文网 2017-04-18 10:10:29
0
8
408
PHP中文网
PHP中文网

认证高级PHP讲师

reply all(8)
巴扎黑

Try without tuples

print h2, a

It should still be a leftover encoding problem

When printing, the __str__() of tuple is actually called

>>> h = u'你好'
>>> (h, 8).__str__()
"(u'\u4f60\u597d', 8)"
巴扎黑

It is caused by different encoding methods. The encoding of the windows platform is generally gbk and isoxxx. Check the encoding method of the web (you can check it in chrome), and then convert the encoding to the same as the system and it will be ok

小葫芦

In fact, you can output Chinese by outputting h2 alone. If you have to output tuples like you do, refer to the code below

from __future__ import unicode_literals
#-*-coding:utf-8-*-
import requests
from bs4 import BeautifulSoup
res = requests.get('http://news.sina.com.cn/china/')
res.encoding='utf-8'
soup=BeautifulSoup(res.text,'html.parser')
for news in soup.select('.news-item'):
    if len(news.select('h2'))>0:
        h2=news.select('h2')[0].text
        a=news.select('a')[0]['href']
        test = str((h2, a))
        print(test.decode("unicode-escape"))
巴扎黑

If you encounter coding problems and want to understand the historical origins of coding, you can read this article, http://foofish.net/python-cha... You will know how to analyze the problem when you encounter coding in the future.

大家讲道理

python3

PHPzhong

The beginning of

u'' indicates that it is already unicode. There is no problem with the encoding, but there is a problem with the way you print. If you change it to this in 2.7, it should be fine

print '%s,%s'%(h2, a)
小葫芦

After reading it, just convert it directly into a string

Ty80

print(h2 + a)

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!