代码:
# /usr/bin/python
#coding:utf-8
__author__ = 'eyu Fanne'
import requests,re
from bs4 import BeautifulSoup
move_url = 'https://movie.douban.com/'
def Robot():
res_url = requests.get(move_url)
print res_url.status_code
soup = BeautifulSoup(res_url.text,'lxml')
print soup.title
soup_a = soup.find_all("a",class_="item")
for i in soup_a:
print i
print soup_a
if __name__=='__main__':
Robot()
结果:
200
<title>
豆瓣电影
</title>
[]
抓取
<a class='item' ....>
这个标签内的值,但获取到的空,这是为何。
Check the source code of the page, there is no movie information in it. In fact, it is rendered by JS on the page.
You can check out this link https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=20&page_start=0
Douban Movies has a public API interface. . Why crawl the page? .
http://developers.douban.com/wiki/?title=movie_v2