在爬虫中取元素的值有多种方法,下面是几种常用的方法:
import re html = "<a href='https://www.example.com'>Example</a>" links = re.findall(r"<a.*?href=['\"](.*?)['\"].*?>(.*?)</a>", html) for link in links: url = link[0] text = link[1] print("URL:", url) print("Text:", text)
from bs4 import BeautifulSoup html = "<h1>This is a title</h1>" soup = BeautifulSoup(html, 'html.parser') titles = soup.find_all('h1') for title in titles: print("Title:", title.text)
from lxml import etree html = "<p>This is a paragraph.</p>" tree = etree.HTML(html) paragraphs = tree.xpath('//p') for paragraph in paragraphs: print("Text:", paragraph.text)
这些都是常见的方法,具体使用哪种方法取决于你所爬取的网站和数据结构的特点。
以上是python怎么在爬虫中取元素里的值的详细内容。更多信息请关注PHP中文网其他相关文章!