想要用 python 做爬虫，是使用 scrapy框架还是用 requests, bs4 等库？-Python Tutorial-php.cn

Table of Contents

回复内容：

Home

Backend Development

Python Tutorial

想要用 python 做爬虫，是使用 scrapy框架还是用 requests, bs4 等库？

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 06, 2016 pm 04:23 PM

python python3 requests scrapy

想要用python（python3）实现一个爬虫，来完成自己的一些需求。
参考网上的资料，发现对自己而言有两种待选的方案：
1. 使用scrapy框架
都说该框架功能强大，实现简单。但是不兼容python3,
2. 使用requests 和 bs4等库来自己实现
相比方案一，可能要自己多写好多代码，以及性能可能不如开源的框架。

由于自己学习的python3（好多人说python3 才是趋势，所以没有学习python2），如果采用方案一，会有scrapy对python3 的支持不够好（虽说现在scrapy官网上说对python3的支持正在进行中，但不相等），希望熟悉的人回答一下 scrapy对python3的支持到底如何？；如果采用方案二，那么想问，如果我想要利用 requests， bs4等库实现一个简单版的 scrapy，困难有多大，需要学习那些东西？

回复内容：

真的不要纠结2还是3，对于爬虫来讲，感觉不到区别，这些都不是事儿，除了编码和print。
而且requests和bs4都支持吧（待我确定下）。

那什么是事儿呢？
1 限制ip
用requests代理，买代理，或者网上免费代理
2 伪装成浏览器
requests切换user agent
3 先登录，保存cookies
requests用session先post拿到cookies，再爬
4 URL参数太多，不明白什么意思
webdriver和phantomjs
5 JavaScript和ajax问题
浏览器f12分析请求规律，直接requests请求。或者用webdriver和phantomjs，如果用scrapy的话，用scrapyjs
6 爬的太慢
多线程，别说gil，一般是网络io慢，cpu等io
7 还是慢
scrapy异步（做过几个项目了，挺好用的），pyspider（这个支持Python3）
8 还是慢
分布式（暂时还没涉及），redis，scrapyd
9 验证码
对不起，帮不了你。简单的可以pil，灰度二值化切割识别
10 如果你想自己实现异步请求的话
grequests不错

爪机回复，待补充。
ps 不知不觉自己用Python有一段时间了，写过爬虫，web，最近用Python挣了点钱前几天刚刚用几个库自己写了一个简单的爬虫，不过因为我是用的Python2.7，所以可能有些不同，先说说我的体验

2个多月前学习了Scrapy框架，之后自己写了几个爬虫，基本是BaseSpider，CrawlSpider，当时感觉写一个爬虫很简单，有一个现成的框架摆在那里，只要自己定义要抓取的类和抓取的函数就行了

之后由于其他事情Python学习断了一个多月，之后看《Python核心编程》，讲到爬虫，就想到为什么不自己写一个，于是开始做。

这时候才体会到写爬虫并不像自己想的那么简单，得自己定义诸如存储数据类，同域名保留函数，数据去重等一系列问题，最后用两种方案写出来，一个是定义一个类，一个是只用函数，不过二者基本上是类似的，当然还有一系列问题没有解决，目前的功能是根据输入的网址和爬取深度来爬取网址，不过基本雏形出来了，以后慢慢解决

个人建议先学习Scrapy，我能感受到的最大的好处就是学习了正则，以至于后来自己写爬虫提取网址直接用正则了，其他的什么库都没有用

学完Scrapy后，试着自己写一个爬虫，因为这时候你对爬虫的基本操作流程已经有所掌握，照猫画虎还不会么，安题主所说，用request和bs4库是肯定不够的，不过不要急着学库，到时候需要了再去查(我个人是喜欢用正则，所以我自己写的爬虫只是用了re，当然不可否认上面两个也很强大，个人喜好而已)写的过程肯定会遇到问题，比如数据存放，去重，抓取，一个一个解决，对提升自己绝对有好处

看着自己写的爬虫跑网页，也是很有成就感的不要纠结python2 还是python3 的问题。
学习编程不光是学习语法，是学习计算思维，编程思路。python2和python3 差别不是很大。

看你的情况，建议先学习标准库或requests 这个库学习爬虫，先学会抓包，模拟post、get ，自动填表等基本技能，再学习scrapy框架。

建议看看黄哥主讲的python爬虫联想词视频，学习一下基础知识。

搜索“python爬虫联想词视频” 有播放地址。

加油！先去试试urllib和urllib2，熟悉一下爬虫的基本思维。然后熟悉了大概之后看看requests，这也是urllib\urllib2封装的，熟悉抓包和分析页面成分，了解POST、GET都是什么原理和实用，试着自己去写几个小站的爬虫，当你不满足于此的时候可以去撸Scrapy了，但是入坑之前推荐楼主先去了解Python的多线程处理，目前我正在死磕中。。看你使用场景。
如果你的爬虫是玩玩，练练手。或者是对某一站点请求并发量不大的时候，可以用scrapy。
如果你的爬虫对某一站点请求很频繁，量很大时，我倾向于使用requests bs re。

爬虫的业务逻辑很简单。重点是反爬！反爬！反爬！

scrapy优势在于抽象了业务，让你通过配置你需要的数据格式，帮你快速获取结果。这在请求量很小的时候还算方便，但当请求量一大起来，必然会遇到反爬机制各种封你，对于反爬scrapy没提供特别有效的处理机制。

另外往往获取有效数据的操作，用BeautifulSoup+re就搞定了，而为了使用scrapy不得不配置的一堆东西反倒显得繁琐了。
既然所有反爬的处理都需要自己弄，这样看来scrapy的优势其实已经很小了，所以我建议requests bs re做。 requests和bs4库还是相当强大的，简单写个几十行，再配上代理和多进程/多线程，就能抓取相当可观的数据。题主如果想入门这两个库可以在网易云课堂上搜一个有关python爬虫的课程，具体名字忘了，但是个人认为讲的还不错。另外就是善用文档，一切说明都在文档里，百度一搜就有。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Saving in R.E.P.O. Explained (And Save Files)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7563

CakePHP Tutorial

1385

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Python vs. JavaScript: Community, Libraries, and Resources Apr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Detailed explanation of docker principle Apr 14, 2025 pm 11:57 PM

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

How to install nginx in centos Apr 14, 2025 pm 08:06 PM

CentOS Installing Nginx requires following the following steps: Installing dependencies such as development tools, pcre-devel, and openssl-devel. Download the Nginx source code package, unzip it and compile and install it, and specify the installation path as /usr/local/nginx. Create Nginx users and user groups and set permissions. Modify the configuration file nginx.conf, and configure the listening port and domain name/IP address. Start the Nginx service. Common errors need to be paid attention to, such as dependency issues, port conflicts, and configuration file errors. Performance optimization needs to be adjusted according to the specific situation, such as turning on cache and adjusting the number of worker processes.

See all articles

想要用 python 做爬虫， 是使用 scrapy框架还是用 requests, bs4 等库？

回复内容：

Hot AI Tools

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

AI Hentai Generator

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics

想要用 python 做爬虫，是使用 scrapy框架还是用 requests, bs4 等库？