python - 爬虫内容保存成文本文件编码问题-PHP中文网问答

社区

学习

工具库

AI工具

休闲

简体中文

python - 爬虫内容保存成文本文件编码问题

欧阳克 2017-06-12 09:26:21

0

1

892

测试一个非常简单的爬虫，把一个非常简约风格的网页的文本内容保存到本地的电脑上。最后出现错误：

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-35-ead5570b2e15> in <module>()
      7     filename=str(i)+'.txt'
      8     with open(filename,'w')as f:
----> 9         f.write(content)
     10         print('当前小说第{}章已经下载完成'.format(i))
     11         f.close()

UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7: illegal multibyte sequence

代码如下：

In [1]: import requests

In [2]: from bs4 import BeautifulSoup

In [3]: re=requests.get('http://www.qu.la/book/168/')

In [4]: html=re.text

In [5]: soup=BeautifulSoup(html,'html.parser')

In [6]: list=soup.find(id="list")
In [9]: link_list=list.find_all('a')
In [14]: mylist=[]
    ...: for link in link_list:
    ...:     mylist.append('http://www.qu.la'+link.get('href'))
    ...:
    ...:
    
    #遍历每个链接，下载文本内容到 本地文本文件
        i=0
    ...: for url in mylist1:
    ...:     re1=requests.get(url)
    ...:     html2=re1.text
    ...:     soup=BeautifulSoup(html2,"html.parser")
    ...:     content=soup.find(id="content").text.replace('chaptererror();', '')
    ...:     filename=str(i)+'.txt'
    ...:     with open(filename,'w')as f:
    ...:         f.write(content)
    ...:         print('当前小说第{}章已经下载完成'.format(i))
    ...:         f.close()
    ...:     i=i+1

欧阳克

温故而知新，可以为师矣。博客：www.ouyangke.com

全部回复(1)

给我你的怀抱

给我你的怀抱2017-06-12 09:28:21 1 楼

f.write(content.encode('utf-8'))

或者

import codecs
with codecs.open(filename, 'w', 'utf-8') as f:
    f.write(content)

点赞 +0

添加回复

热门专题

更多>

热门文章

热门教程

更多>

相关教程

热门推荐

最新课程

最新ThinkPHP 5.1全球首发视频教程(60天成就PHP大牛线上培训班课)

1428675
php入门教程之一周学会PHP

4280197
JAVA 初级入门视频教程

2589144

最新下载

更多>

网站特效

网站源码

网站素材

前端模板