python - Problem with beautifulsoup getting web page content
習慣沉默
習慣沉默 2017-05-27 17:39:42
0
6
632
我要的是这个里面的内容
<p class="talk-article__body talk-transcript__body">

PYTHON code:

neirong=soup.find('p',{'class':'talk-article__body talk-transcript__body'})

But the returned result is empty. Is this selector written incorrectly?

習慣沉默
習慣沉默

reply all(6)
某草草
neirong=soup.find_all('p',class_='talk-article__body talk-transcript__body')

https://www.crummy.com/softwa...

阿神

Refer to the instructions given in: https://www.crummy.com/softwa..., the correct way to use it is:
neirong=soup.find('p',class_='talk-article__body talk-transcript__body')

In order to get p包含的内容,进一步调用neirong.contents just

刘奇
neirong = soup.select('.talk-article__body.talk-transcript__body')
刘奇

The content you see from the browser is dynamically generated by js, and it cannot be matched using bs. I found that the strange class names I saw were basically generated by js

给我你的怀抱

Use find_all, find cannot be used for class

曾经蜡笔没有小新
  • Personally, when using BeautifulSoup to parse web pages, if the author intends to use CSS features to position elements, it is best to use soup.select(). This method can use the value of the class as a parameter or the tag. Attribute can be used as a parameter, which is very convenient. It is best used to search for a single tag. At the same time, the parameter supports css selector strings, such as: soup.select("#id > .class a.title").

  • soup.find() method seems not to be used much at present. I wonder if BeautifulSoup4 has deprecated it. Now generally as long as find appears, it is find_all() and other methods.
    Please refer to the Chinese document of "Super Soup" for the above details: http://beautifulsoup.readthed...

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!