Home > Backend Development > Python Tutorial > Detailed explanation of html analysis method using python's BeautifulSoup

Detailed explanation of html analysis method using python's BeautifulSoup

高洛峰
Release: 2017-03-31 11:36:53
Original
1613 people have browsed it

1) Searchtag:

find(tagname) # Directly search for the tag named tagname, such as: find('head')
find (list)                # Search for tags in list, such as: find(['head', 'body'])
find(dict)                                                {'head':True, 'body':True})
find(re.compile('')) # Search for tags that conform to regular rules, such as: find(re.compile('^p')) Search for Tags starting with p
find(lambda) # Search Function Returns a tag whose result is true, such as: find(lambda name: if len(name) == 1) Search Tag with length 1
find(True) # Search all tags

2) Search text (text)

3) recursive, limit:

from bs4 import BeautifulSoup
import re
 
doc = ['<html><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
       '</html>']
soup = BeautifulSoup(''.join(doc))
 
print soup.prettify()+"\n"
print soup.findAll('b')
 
print soup.findAll(text=re.compile("paragraph"))
print soup.findAll(text=True)
print soup.findAll(text=lambda(x):len(x)<12)
 
a = soup.findAll(re.compile('^b'))
print [tag.name for tag in a]
 
print [tag.name for tag in soup.html.findAll()]
print [tag.name for tag in soup.html.findAll(recursive=False)]
 
print soup.findAll('p',limit=1)
Copy after login

The above is the detailed content of Detailed explanation of html analysis method using python's BeautifulSoup. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template