Community

Learn

Tools Library

AI Tools

Leisure

English

Home > Backend Development > Python Tutorial > python crawler Scrapy uses proxy configuration

python crawler Scrapy uses proxy configuration

高洛峰

Release： 2016-10-17 13:56:57

Original

2371 people have browsed it

When crawling website content, the most common problem encountered is: the website has restrictions on IP and has anti-crawling functions. The best way is to rotate IP crawling (adding a proxy)

Let’s talk about Scrapy How to configure the agent and crawl

1. Create a new "middlewares.py" under the Scrapy project

# Importing base64 library because we&#39;ll need it ONLY in case if the proxy we are going to use requires authentication
import base64 
# Start your middleware class
class ProxyMiddleware(object):
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        request.meta[&#39;proxy&#39;] = "http://YOUR_PROXY_IP:PORT"
  
        # Use the following lines if your proxy requires authentication
        proxy_user_pass = "USERNAME:PASSWORD"
        # setup basic authentication for the proxy
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers[&#39;Proxy-Authorization&#39;] = &#39;Basic &#39; + encoded_user_pass

Copy after login

2. Add

DOWNLOADER_MIDDLEWARES = {
    &#39;scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware&#39;: 110,
    &#39;pythontab.middlewares.ProxyMiddleware&#39;: 100,
}

Copy after login

to the project configuration file (./pythontab/settings.py)

Related labels：

python爬虫之Scrapy 使用代理配置

Previous article：10 Practical Django Tips and Advice Next article：scrapy custom crawler-crawl javascript content

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

Examples of html settings for bold, italic, underline, strikethrough and other font effects

1970-01-01 08:00:00
Implement a Java version of Redis

1970-01-01 08:00:00
The simplest WeChat applet Demo

1970-01-01 08:00:00
Introduction to simple operation methods of pandas.DataFrame (create, index, add and delete) in python

1970-01-01 08:00:00
WeChat Mini Program: Example of how to implement tabs effect

1970-01-01 08:00:00
Python constructs custom methods to beautify dictionary structure output

1970-01-01 08:00:00
HTML5: Use Canvas to process Video in real time

1970-01-01 08:00:00
Asp.net uses SignalR to send pictures

1970-01-01 08:00:00
WeChat Mini Program Development Tutorial-App() and Page() Function Overview

1970-01-01 08:00:00
Detailed explanation of how to use python redis

1970-01-01 08:00:00

Latest Issues

Team collaboration - What should I do if someone needs the feature I wrote as a dependency in git flow?

From 1970-01-01 08:00:00

0

0

0

Objective-c - Constraints for iOS a warning issue

From 1970-01-01 08:00:00

0

0

0

Confusion about using gitlab's fork&pull request mode within the team

From 1970-01-01 08:00:00

0

0

0

Objective-c - In iOS development, Instagram cannot be authorized after logging in. Instagram does not jump back to the application. How to get the callback address?

From 1970-01-01 08:00:00

0

0

0

Version Control - About the use of SVN and GIT in company projects?

From 1970-01-01 08:00:00

0

0

0

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template