使用旋轉代理程式進行網頁抓取：Python 請求和 Selenium 的範例-Python教學-PHP中文網

使用旋轉代理程式進行網頁抓取：Python 請求和 Selenium 的範例

DDD

發布： 2024-11-01 13:01:29

原創

764 人瀏覽過

使用輪換代理進行網頁抓取是一種有效的方法，尤其是當您需要頻繁訪問網站或繞過反爬蟲機制時。輪換代理可以自動更改IP位址，從而降低被屏蔽的風險。

以下是使用 Python 的 requests 函式庫和 Selenium 進行網頁抓取的旋轉代理程式的範例。

使用請求庫

‌1.安裝必要的函式庫：

首先，您需要安裝requests函式庫。
‌

2. 配置旋轉代理程式‌：

您需要從輪換代理服務提供者取得 API 金鑰或代理列表，並在請求中配置它們。

Web Scraping with Rotating Proxies: An Example with Python Requests and Selenium

3. 發送請求‌：

使用requests函式庫傳送HTTP請求並透過代理轉送。

範例程式碼：

import requests 
from some_rotating_proxy_service import get_proxy  # Assuming this is the function provided by your rotating proxy service 

#Get a new proxy 
proxy = get_proxy() 

# Set the proxy's HTTP and HTTPS headers (may vary depending on the proxy service's requirements) 
proxies = { 
    'http': f'http://{proxy}', 
    'https': f'https://{proxy}' 
} 

# Sending a GET request 
url = 'http://example.com' 
try: 
    response = requests.get(url, proxies=proxies) 
    # Processing Response Data 
    print(response.text) 
except requests.exceptions.ProxyError: 
    print('Proxy error occurred') 
except Exception as e: 
    print(f'An error occurred: {e}')

登入後複製

使用硒

‌1.安裝必要的函式庫和驅動程式‌：

為您的瀏覽器安裝 Selenium 庫和 WebDriver（例如 ChromeDriver）。

2‌.配置輪換代理：

與請求類似，需要從輪換代理服務提供者獲取代理信息，並在Selenium中進行配置。

‌3.啟動瀏覽器並設定代理：

使用 Selenium 啟動瀏覽器並透過瀏覽器選項設定代理程式。

範例程式碼：

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options 
from some_rotating_proxy_service import get_proxy  # Assuming this is the function provided by your rotating proxy service 

# Get a new proxy 
proxy = get_proxy() 

# Set Chrome options to use a proxy 
chrome_options = Options() 
chrome_options.add_argument(f'--proxy-server=http://{proxy}') 

# Launch Chrome browser 
driver = webdriver.Chrome(options=chrome_options) 

# Visit the website 
url = 'http://example.com' 
driver.get(url) 

# Processing web data 
# ...（For example, use driver.page_source to get the source code of a web page, or use driver to find a specific element.） 

# Close the browser 
driver.quit()

登入後複製

注意事項

確保輪換代理服務可靠，並提供足夠的代理池，避免頻繁更換IP和阻塞。
根據輪換代理服務的定價和使用限制，正確規劃您的抓取任務。
使用Selenium時，請注意處理瀏覽器視窗關閉和資源釋放，以避免記憶體洩漏或其他問題。
遵守目標網站的robots.txt檔案及抓取協議，避免法律糾紛。

以上是使用旋轉代理程式進行網頁抓取：Python 請求和 Selenium 的範例的詳細內容。更多資訊請關注PHP中文網其他相關文章！