Home >
Backend Development >
Python Tutorial >
Comparison of four commonly used methods of locating elements in Python crawlers, which one do you prefer?
Comparison of four commonly used methods of locating elements in Python crawlers, which one do you prefer?
We take the title of the first 20 books as an example. First make sure that the website does not have anti-crawling measures set up, and whether it can directly return the content to be parsed:
After careful inspection, it is found that the required data is all in the returned content, indicating that it is not required Special consideration is given to anti-crawling measures
After reviewing the web page elements, it can be found that the bibliographic information is included in li, which belongs to class and is bang_list clearfix bang_list_modeul in
Further inspection can also reveal the corresponding position of the book title, which is an important basis for various analysis methods
1. Traditional BeautifulSoup operation
The classic BeautifulSoup method uses from bs4 import BeautifulSoup, and then uses soup = BeautifulSoup(html, " lxml") Convert the text into a specific standardized structure and use the find series of methods to parse it. The code is as follows:
title = li.find('div', class_='name').find('a')['title'] # 逐个解析获取书名
print(title)
if__name__ == '__main__':
bs_for_parse(response)
Copy after login
Successfully obtained 20 book titles, some of which appear lengthy in writing can be processed through regular expressions or other string methods. This article will not introduce them in detail
The above is the detailed content of Comparison of four commonly used methods of locating elements in Python crawlers, which one do you prefer?. For more information, please follow other related articles on the PHP Chinese website!
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn