PHP采集CSDN博客边栏的阅读排行
项目中要用到采集的数据,所以就先拿CSDN博客来试了试。这里使用Simple HTML DOM(官网)这个库,它能够方便的遍历HTML文档。
<span style="max-width:90%">php </span><span style="color:#0000ff">include_once</span>('simple_html_dom.php'<span style="color:#000000">); </span><span style="color:#008080">header</span>('Content-Type:text/html;charset=utf-8'<span style="color:#000000">); </span><span style="color:#800080">$html</span> = file_get_html('http://blog.csdn.net/szy361'<span style="color:#000000">); </span><span style="color:#800080">$res</span> = <span style="color:#800080">$html</span>->find('#hotarticls ul.panel_body li a[title]');<span style="color:#008000">//</span><span style="color:#008000">取得id=hotarticls下class为panel_bodya的ul标签下的a的title</span> <span style="color:#800080">$span</span> = <span style="color:#800080">$html</span>->find('#hotarticls ul.panel_body li span');<span style="color:#008000">//</span><span style="color:#008000">取得span</span> <span style="color:#0000ff">foreach</span>(<span style="color:#800080">$res</span> <span style="color:#0000ff">as</span> <span style="color:#800080">$element</span><span style="color:#000000">){ </span><span style="color:#800080">$arr</span>[] = <span style="color:#800080">$element</span>->title.'+'.<span style="color:#800080">$element</span>->href;<span style="color:#008000">//</span><span style="color:#008000">将title值和href的值通过+连起来</span><span style="color:#000000"> } </span><span style="color:#0000ff">foreach</span>(<span style="color:#800080">$span</span> <span style="color:#0000ff">as</span> <span style="color:#800080">$e</span><span style="color:#000000">){ </span><span style="color:#800080">$brr</span>[] = <span style="color:#800080">$e</span>->innertext;<span style="color:#008000">//</span><span style="color:#008000">得到span下的值组成的数组</span><span style="color:#000000"> } </span><span style="color:#008000">//</span><span style="color:#008000">将两个数组组成一个新的二维数组</span> <span style="color:#0000ff">for</span>(<span style="color:#800080">$i</span>=0;<span style="color:#800080">$i</span>count(<span style="color:#800080">$res</span>);<span style="color:#800080">$i</span>++<span style="color:#000000">){ </span><span style="color:#800080">$crr</span>[] = <span style="color:#008080">explode</span>('+',<span style="color:#800080">$arr</span>[<span style="color:#800080">$i</span><span style="color:#000000">]); </span><span style="color:#800080">$crr</span>[<span style="color:#800080">$i</span>][] = <span style="color:#800080">$brr</span>[<span style="color:#800080">$i</span><span style="color:#000000">]; } </span><span style="color:#0000ff">return</span> <span style="color:#800080">$crr</span>;
登录后复制
扩展:
PHP Simple HTML DOM解析器使用入门