python - 请教2个BeautifulSoup匹配豆瓣top250电影的优雅实现？

Question

豆瓣top250电影的链接 {代码...} 网页的dom一般都是以这样的形式排列的，想请教两点： 以电影名来说，有两个标签的class都是title，我这种原始方法会匹配两个title，有什么方法可以只匹配第一个中文title呢？ {代...

巴扎黑 · Answer

See if this works too

def get_top250(url):
    html = requests.get(url).content
    soup = BeautifulSoup(html, 'lxml')
    soup = soup.find('ol', class_="grid_view")
    for titles in soup.find_all('li'):
        """ 这里使用find就可以实现只获取一个title的功能 """
        print(titles.find('span', class_="title").text)
        # 获取导演和主演信息
        print(titles.find('p', class_="bd").find('p').text.strip().replace('
', ''))

PHP中文网 · Answer

1: Just subscript to get the value

titles = soup.find_all(name='span', attrs={'class': 'title'})[0].text

If there is no other p in
, just look for p directly in p:

content = soup.find('p',attrs={'class':'bd'}).find('p').text

Re-answer:

import requests
from bs4 import BeautifulSoup as BS

soup = BS(requests.get('https://movie.douban.com/top250').text)

ol = soup.find('ol', attrs={"class":'grid_view'}) # 找到包含电影的ol
lis = ol.find_all('li') # 找到所有的电影li

for movie in lis:
    ###
    # 处理每个电影，就跟上面一样了
    ###

补充