python - 爬我们学院的网站出现404notfound
PHPz
PHPz 2017-04-17 17:43:42
0
1
365
# -*- encoding: utf8 -*-

import urllib
import urllib2
import re

page = 1
url = u'http://math.xmu.edu.cn/' + str(page)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
headers = { 'User-Agent' : user_agent}
try:
    request = urllib2.Request(url,headers = headers)
    response = urllib2.urlopen(request)
    content = response.read().decode('utf-8')
    pattern = re.compile(r'<article class="home_news_l">.*?<p>(.*?)</p>.*?<p>(.*?)</p></article>',re.S)
    items = re.findall(pattern,content)
    for item in items:
        print item.encode('utf-8')

except urllib2.URLError, e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason

这个网站我可以打开,但是用爬虫就是404,我也有head..不知道问题出在哪了,谢谢你

PHPz
PHPz

学习是最好的投资!

reply all(1)
巴扎黑


The url you constructed is http://math.xmu.edu.cn/1. This url does not exist. Check it carefully first

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template