Home > Database > Redis > How to use python to crawl CSDN popular comment URLs and store them in redis

How to use python to crawl CSDN popular comment URLs and store them in redis

WBOY
Release: 2023-05-28 15:17:23
forward
914 people have browsed it

1. Configure webdriver

Download the Google Chrome driver, and configure it

import timeimport randomfrom PIL import Imagefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECif __name__ == '__main__':options = webdriver.ChromeOptions()options.binary_location = r'C:UsershhhAppDataLocalGoogleChromeApplication谷歌浏览器.exe'# driver=webdriver.Chrome(executable_path=r'D:360Chromechromedriverchromedriver.exe')driver = webdriver.Chrome(options=options)#以java模块为例driver.get('https://www.csdn.net/nav/java')for i in range(1,20):driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")time.sleep(2)
Copy after login

2. Get the URL

from bs4 import BeautifulSoupfrom lxml import etree 
html = etree.HTML(driver.page_source)# soup = BeautifulSoup(html, 'lxml')# soup_herf=soup.find_all("#feedlist_id > li:nth-child(1) > div > div > h2 > a")# soup_herftitle = html.xpath('//*[@id="feedlist_id"]/li/div/div/h2/a/@href')
Copy after login

You can see, Crawled a lot at once,The speed is very fast
How to use python to crawl CSDN popular comment URLs and store them in redis

3. Write to Redis

After importing the redis package,Configure the redis port and redis database& #xff0c;Use rpush function to write
Open redisHow to use python to crawl CSDN popular comment URLs and store them in redis

import redis
r_link = redis.Redis(port='6379', host='localhost', decode_responses=True, db=1)for u in title:print("准备写入{}".format(u))r_link.rpush("csdn_url", u)print("{}写入成功!".format(u))print('=' * 30, 'n', "共计写入url:{}个".format(len(title)), 'n', '=' * 30)
Copy after login

How to use python to crawl CSDN popular comment URLs and store them in redis

Done!

You can see it in Redis Desktop Manager Crawling and writing are very fast.
How to use python to crawl CSDN popular comment URLs and store them in redis
To use it, just use rpop to pop it off the stack

one_url = r_link.rpop("csdn_url)")while one_url:print("{}被弹出!".format(one_url))
Copy after login

The above is the detailed content of How to use python to crawl CSDN popular comment URLs and store them in redis. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:yisu.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template