网页爬虫 - 用python的requests库爬虫时,post参数后却没有返回预期的结果
高洛峰
高洛峰 2017-04-18 09:41:09
0
2
957

我使用的是python3.4,在用requests库爬虫时,向百度post了一个参数,但是返回来的内容却没有python关键词的搜索列表,这是什么原因呢?我的代码如下:

import requests
import urllib
from bs4 import BeautifulSoup

url = 'https://www.baidu.com/s?'
key_word={'key':'python'}
headers = {'User_Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

def get_data(url):
    try:
        html=requests.post(url,data=key_word,headers=headers)
        if html.status_code == 200:
            soup=BeautifulSoup(html.text,'lxml')
            return soup
        else:
            print('error')
    except urllib.error.HTTPError as e:
        print(url, e, str(time.time()))

print(get_data(url))

以下是我返回的内容:

<!DOCTYPE html>
<!--STATUS OK--><html>
<head>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="always" name="referrer"/>
<title>页面不存在_百度搜索</title>
<style data-for="result">
        body {color: #333; background: #fff; padding: 0; margin: 0; position: relative; min-width: 700px; font-family: arial; font-size: 12px }
        p, form, ol, ul, li, dl, dt, dd, h3 {margin: 0; padding: 0; list-style: none }
        input {padding-top: 0; padding-bottom: 0; -moz-box-sizing: border-box; -webkit-box-sizing: border-box; box-sizing: border-box } img {border: none; }
        .logo {width: 117px; height: 38px; cursor: pointer }
         #wrapper {_zoom: 1 }
        #head {padding-left: 35px; margin-bottom: 20px; width: 900px }
        .fm {clear: both; position: relative; z-index: 297 }
        .btn, #more {font-size: 14px } 
        .s_btn {width: 95px; height: 32px; padding-top: 2px\9; font-size: 14px; padding: 0; background-color: #ddd; background-position: 0 -48px; border: 0; cursor: pointer }
        .s_btn_h {background-position: -240px -48px }
        .s_btn_wr {width: 97px; height: 34px; display: inline-block; background-position: -120px -48px; *position: relative; z-index: 0; vertical-align: top }
        #foot {}
        #foot span {color: #666 }
        .s_ipt_wr {height: 32px }
        .s_form:after, .s_tab:after {content: "."; display: block; height: 0; clear: both; visibility: hidden }
        .s_form {zoom: 1; height: 55px; padding: 0 0 0 10px }
        #result_logo {float: left; margin: 7px 0 0 }
        #result_logo img {width: 101px }
        #head {padding: 0; margin: 0; width: 100%; position: absolute; z-index: 301; min-width: 1000px; background: #fff; border-bottom: 1px solid #ebebeb; position: fixed; _position: absolute; -webkit-transform: translateZ(0) }
        #head .head_wrapper {_width: 1000px }
        #head.s_down {box-shadow: 0 0 5px #888 }
        .fm {clear: none; float: left; margin: 11px 0 0 10px }
        #s_tab {background: #f8f8f8; line-height: 36px; height: 38px; padding: 55px 0 0 121px; float: none; zoom: 1 }
        #s_tab a, #s_tab b {width: 54px; display: inline-block; text-decoration: none; text-align: center; color: #666; font-size: 14px }
        #s_tab b {border-bottom: 2px solid #38f; font-weight: bold; color: #323232 }
        #s_tab a:hover {color: #323232 }
        #content_left {width: 540px; padding-left: 121px; padding-top: 5px }
        .to_tieba, .to_zhidao_bottom {margin: 10px 0 0 121px }
        #help {background: #f5f6f5; zoom: 1; padding: 0 0 0 50px; float: right }
        #help a {color: #777; padding: 0 15px; text-decoration: none }
        #help a:hover {color: #333 }
        #foot {position: fixed; bottom:0; width: 100%; background: #f5f6f5; border-top: 1px solid #ebebeb; text-align: left; height: 42px; line-height: 42px; margin-top: 40px; *margin-top: 0; _position:absolute; _bottom:auto; _top:expression(eval(document.documentElement.scrollTop+document.documentElement.clientHeight-this.offsetHeight-(parseInt(this.currentStyle.marginTop,10)||0)-(parseInt(this.currentStyle.marginBottom,10)||0))); }

        .content_none {padding: 45px 0 25px 121px } .s_ipt_wr.bg,
        .s_btn_wr.bg, #su.bg {background-image: none }
        .s_ipt_wr.bg {background: 0 }
        .s_btn_wr {width: auto; height: auto; border-bottom: 1px solid transparent; *border-bottom: 0 }
        .s_btn {width: 100px; height: 34px; color: white; letter-spacing: 1px; background: #3385ff; border-bottom: 1px solid #2d78f4; outline: medium; *border-bottom: 0; -webkit-appearance: none; -webkit-border-radius: 0 }
        .s_btn:hover {background: #317ef3; border-bottom: 1px solid #2868c8; *border-bottom: 0; box-shadow: 1px 1px 1px #ccc }
        .s_btn:active {background: #3075dc; box-shadow: inset 1px 1px 3px #2964bb; -webkit-box-shadow: inset 1px 1px 3px #2964bb; -moz-box-shadow: inset 1px 1px 3px #2964bb; -o-box-shadow: inset 1px 1px 3px #2964bb }
        #lg {display: none }
        #head .headBlock {margin: -5px 0 6px 121px }
        #content_left .leftBlock {margin-bottom: 14px; padding-bottom: 5px; border-bottom: 1px solid #f3f3f3 }
        .s_ipt_wr {border: 1px solid #b6b6b6; border-color: #7b7b7b #b6b6b6 #b6b6b6 #7b7b7b; background: #fff; display: inline-block; vertical-align: top; width: 539px; margin-right: 0; border-right-width: 0; border-color: #b8b8b8 transparent #ccc #b8b8b8; overflow: hidden }
        .s_ipt_wr.ip_short {width: 439px; }
        .s_ipt_wr:hover, .s_ipt_wr.ipthover {border-color: #999 transparent #b3b3b3 #999 }
        .s_ipt_wr.iptfocus {border-color: #4791ff transparent #4791ff #4791ff }
        .s_ipt_tip {color: #aaa; position: absolute; z-index: -10; font: 16px/22px arial; height: 32px; line-height: 32px; padding-left: 7px; overflow: hidden; width: 526px }
        .s_ipt {width: 526px; height: 22px; font: 16px/18px arial; line-height: 22px\9; margin: 6px 0 0 7px; padding: 0; background: transparent; border: 0; outline: 0; -webkit-appearance: none }
        #kw {position: relative;display: inline-block;}
        input::-ms-clear {display: none }
        /*Error page css*/
        .norsSuggest {display: inline-block; color: #333; font-family: arial; font-size: 13px; position: relative; } 
        .norsTitle {font-size: 22px; font-family: Microsoft Yahei; font-weight: normal; color: #333; margin: 35px 0 25px 0; }
        .norsTitle2 {font-family: arial; font-size: 13px; color: #666; }
        .norsSuggest ol {margin-left: 47px; }
        .norsSuggest li {margin: 13px 0; }
    </style>
</head>
<body link="#0000cc">
<p class="wrapper_l" id="wrapper">
<p id="head">
<p class="head_wrapper">
<p class="s_form">
<p class="s_form_wrapper">
<a href="/" id="result_logo"><img alt="到百度首页" src="//www.baidu.com/img/baidu_jgylogo3.gif" title="到百度首页"/></a>
<form action="/s" class="fm" id="form" name="f">
<input name="ie" type="hidden" value="utf-8"/>
<input name="f" type="hidden" value="8"/>
<input name="rsv_bp" type="hidden" value="1"/>
<input name="ch" type="hidden" value=""/>
<input name="tn" type="hidden" value="baiduerr"/>
<input name="bar" type="hidden" value=""/>
<span class="bg s_ipt_wr iptfocus">
<input autocomplete="off" autofocus="" class="s_ipt" id="kw" maxlength="255" name="wd" value=""/>
</span><span class="bg s_btn_wr">
<input class="bg s_btn" id="su" type="submit" value="百度一下"/>
</span>
</form>
</p>
</p>
</p>
</p>
<p class="s_tab" id="s_tab"><b>网页</b><a href="http://news.baidu.com/ns?cl=2&amp;rn=20&amp;tn=news&amp;word=" wdfield="word">新闻</a><a href="http://tieba.baidu.com/f?kw=&amp;fr=wwwt" wdfield="kw">贴吧</a><a href="http://zhidao.baidu.com/q?ct=17&amp;pn=0&amp;tn=ikaslist&amp;rn=10&amp;word=&amp;fr=wwwt" wdfield="word">知道</a><a href="http://music.baidu.com/search?fr=ps&amp;ie=utf-8&amp;key=" wdfield="key">音乐</a><a href="http://image.baidu.com/i?tn=baiduimage&amp;ps=1&amp;ct=201326592&amp;lm=-1&amp;cl=2&amp;nc=1&amp;ie=utf-8&amp;word=" wdfield="word">图片</a><a href="http://v.baidu.com/v?ct=301989888&amp;rn=20&amp;pn=0&amp;db=0&amp;s=25&amp;ie=utf-8&amp;word=" wdfield="word">视频</a><a href="http://map.baidu.com/m?word=&amp;fr=ps01000" wdfield="word">地图</a><a href="http://wenku.baidu.com/search?word=&amp;lm=0&amp;od=0&amp;ie=utf-8" wdfield="word">文库</a><a href="//www.baidu.com/more/">更多»</a></p>
<p id="wrapper_wrapper">
<p id="content_left">
<p class="nors">
<p class="norsSuggest">
<h3 class="norsTitle">很抱歉,您要访问的页面不存在!</h3>
<p class="norsTitle2">温馨提示:</p>
<ol>
<li>请检查您访问的网址是否正确</li>
<li>如果您不能确认访问的网址,请浏览<a href="//www.baidu.com/more/index.html">百度更多</a>页面查看更多网址。</li>
<li>回到顶部重新发起搜索</li>
<li>如有任何意见或建议,请及时<a href="http://qingting.baidu.com/index">反馈给我们</a>。</li>
</ol>
</p>
</p>
</p>
</p>
<p id="foot">
<span id="help" style="float:left;padding-left:121px">
<a href="http://help.baidu.com/question" target="_blank">帮助</a>
<a href="http://www.baidu.com/search/jubao.html" target="_blank">举报</a>
<a href="http://jianyi.baidu.com" target="_blank">给百度提建议</a>
</span>
</p>
</p></body>
<script>
(function(){
    var bds = {
        util: {}
    };
    var c = document.getElementById('kw').parentNode;

    bds.util.getWinWidth = function(){
        return window.document.documentElement.clientWidth;
    };

    bds.util.setFormWidth = function(){
        var width = bds.util.getWinWidth();
        if(width < 1217)    {bds.util.setClass(c, 'ip_short', 'add')}
        else                {bds.util.setClass(c, 'ip_short', 'remove')};
    };

    bds.util.setClass = function(obj, class_name, set) {
        var ori_class = obj.className,
            has_class_p = -1,
            ori_class_arr = [],
            new_class = '';

        if(ori_class.length) ori_class_arr = ori_class.split(' ');

        for( i in ori_class_arr) {
            if(ori_class_arr[i] == class_name) has_class_p = i;
        }

        if( set == 'remove' && has_class_p >= 0) {
            ori_class_arr.splice(has_class_p, 1);
            new_class = ori_class_arr.join(' ');
            obj.className = new_class;
        } else if( set == 'add' && has_class_p < 0) {
            ori_class_arr.push(class_name);
            new_class = ori_class_arr.join(' ');
            obj.className = new_class;
        }
    }
    bds.util.setFormWidth();

    if (typeof document.addEventListener != "undefined") {
        window.addEventListener('resize', bds.util.setFormWidth, false);
        document.getElementById('kw').addEventListener('focus', function(){bds.util.setClass(c,'iptfocus', 'add');}, false);
        document.getElementById('kw').addEventListener('blur', function(){bds.util.setClass(c,'iptfocus', 'remove');}, false);
    } else {
        window.attachEvent('onresize', bds.util.setFormWidth, false);
        document.getElementById('kw').attachEvent('onfocus', function(){bds.util.setClass(c,'iptfocus', 'add');}, false);
        document.getElementById('kw').attachEvent('onblur', function(){bds.util.setClass(c,'iptfocus', 'remove');}, false);
    } 

})();

</script>
</html>
高洛峰
高洛峰

拥有18年软件开发和IT教学经验。曾任多家上市公司技术总监、架构师、项目经理、高级软件工程师等职务。 网络人气名人讲师,...

membalas semua(2)
小葫芦

POST tidak boleh dilakukan, gunakan GET untuk meminta URL ini http://www.baidu.com/s?wd=python

Anda boleh mengujinya seperti ini di bawah bash:

curl http://www.baidu.com/s\?wd\=python > python.html

Kemudian buka ini python.html dan lihat

Peter_Zhu

Anda bercakap tentang senarai gesaan carian Baidu Mereka menggunakan permintaan dapatkan, bukan siaran

>>> r=requests.get('https://sp0.baidu.com/5a1Fazu8AA54nxGko9WTAnF6hhy/su?wd=pyth
on')
>>> r.text
'window.baidu.sug({q:"python",p:false,s:["python基础教程","python set","python j
son","python mysql","python web开发","python requests","python for循环","python3
","python环境变量设置","python 多线程"]});'

URL yang anda masukkan terus ialah https://www.baidu.com/baidu?word=python&ie=utf-8&tn=98012088_2_dg&ch=9
Hanya permintaan.dapatkan terus

Muat turun terkini
Lagi>
kesan web
Kod sumber laman web
Bahan laman web
Templat hujung hadapan