Using XPath with BeautifulSoup
BeautifulSoup is a popular Python library for parsing and manipulating HTML documents. However, it does not natively support XPath expressions.
Alternative: lxml
An alternative library called lxml provides full XPath 1.0 support. It also has a BeautifulSoup compatible mode that can parse broken HTML like BeautifulSoup. To use XPath with lxml:
from lxml import etree from urllib import request url = "http://www.example.com/servlet/av/ResultTemplate=AVResult.html" response = request.urlopen(url) tree = etree.parse(response, etree.HTMLParser()) result_list = tree.xpath("/html/body/div/table/tbody/tr[1]/td[1]")
Using CSS Selectors with lxml
lxml also has CSSSelector support, which can translate CSS statements into XPath expressions. For example, to find td elements with the class empformbody:
from lxml.cssselect import CSSSelector css_selector = CSSSelector('td.empformbody') result_list = css_selector(tree)
CSS Selectors in BeautifulSoup
Interestingly, BeautifulSoup has its own CSS selector support:
soup = BeautifulSoup(html, "html.parser") result_list = soup.select('table#foobar td.empformbody')
The above is the detailed content of How can I use XPath with BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!