源自:Python (programming language): What are the best Python scripts you've ever written?
想看到更多的更酷的东西 :- |
<code class="language-python"><span class="c"># -*- coding: UTF-8 -*-</span> <span class="n">__author__</span> <span class="o">=</span> <span class="s">'ftium4.com'</span> <span class="c">#导入urllib2库,用于获取网页</span> <span class="kn">import</span> <span class="nn">urllib2</span> <span class="c">#使用开源库webscraping库的xpath模块</span> <span class="kn">from</span> <span class="nn">webscraping</span> <span class="kn">import</span> <span class="n">xpath</span><span class="p">,</span><span class="n">common</span> <span class="k">def</span> <span class="nf">get_data</span><span class="p">(</span><span class="n">url</span><span class="p">):</span> <span class="n">req</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">req</span><span class="o">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">'User-Agent'</span><span class="p">,</span> <span class="s">'Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.14) Gecko/20080404 (FoxPlus) Firefox/2.0.0.14'</span><span class="p">)</span> <span class="c">#获得响应</span> <span class="n">reponse</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">req</span><span class="p">)</span> <span class="c">#将响应的内容存入html变量</span> <span class="n">html</span> <span class="o">=</span> <span class="n">reponse</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="c">#以下抓取页面的番号和片名</span> <span class="n">title</span> <span class="o">=</span> <span class="n">xpath</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">html</span><span class="p">,</span> <span class="s">'//div[@class="av style1"]/a[1]/@title'</span><span class="p">)</span> <span class="k">return</span> <span class="n">title</span> <span class="c">#创建文本用于保存采集结果</span> <span class="n">f</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="s">r'D:\f.txt'</span><span class="p">,</span><span class="s">'w'</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">494</span><span class="p">):</span> <span class="n">url</span> <span class="o">=</span> <span class="s">r'http://dmm18.net/index.php?pageno_b=</span><span class="si">%s</span><span class="s">'</span><span class="o">%</span><span class="n">p</span> <span class="k">print</span> <span class="n">url</span> <span class="n">title</span> <span class="o">=</span> <span class="n">get_data</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">for</span> <span class="n">item1</span> <span class="ow">in</span> <span class="n">title</span><span class="p">:</span> <span class="c">#将采集结果写入文本中</span> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">item1</span><span class="p">)</span><span class="o">+</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span> <span class="k">print</span> <span class="n">item1</span> <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> </code>