python - 爬我们学院的网站出现404notfound-PHP Chinese Network Q&A

Community

Learn

Tools Library

AI Tools

Leisure

English

python - 爬我们学院的网站出现404notfound

PHPz 2017-04-17 17:43:42

0

1

365

# -*- encoding: utf8 -*-

import urllib
import urllib2
import re

page = 1
url = u'http://math.xmu.edu.cn/' + str(page)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
headers = { 'User-Agent' : user_agent}
try:
    request = urllib2.Request(url,headers = headers)
    response = urllib2.urlopen(request)
    content = response.read().decode('utf-8')
    pattern = re.compile(r'<article class="home_news_l">.*?<p>(.*?)</p>.*?<p>(.*?)</p></article>',re.S)
    items = re.findall(pattern,content)
    for item in items:
        print item.encode('utf-8')

except urllib2.URLError, e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason

这个网站我可以打开，但是用爬虫就是404，我也有head..不知道问题出在哪了，谢谢你

PHPz

学习是最好的投资！

reply all(1)

巴扎黑

巴扎黑2017-04-17 17:45:42 1 floor

The url you constructed is http://math.xmu.edu.cn/1. This url does not exist. Check it carefully first

Like +0

Add Reply

Popular Topics

More>

Popular Articles

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template