Home > Backend Development > PHP Tutorial > php 文章采集正则代码_PHP

php 文章采集正则代码_PHP

WBOY
Release: 2016-06-01 12:22:29
Original
724 people have browsed it
复制代码 代码如下:
//采集html
function getwebcontent($url){
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = trim(curl_exec($ch));
curl_close($ch);
return $contents;
}


//获得标题和url
$string =
getwebcontent('http://www.***.com/learn/zhunbeihuaiyun/jijibeiyun/2');
//正则匹配
  • 获取标题和地址
    preg_match_all ("/
  • (.*)/",$string, $out, PREG_SET_ORDER);
    foreach($out as $key => $value){
    $article['title'][] = $out[$key][2];
    $article['link'][] = "http://www.***.com/learn/article/".$out[$key][1];
    }
    //根据url获取文章内容
    foreach($article['link'] as $key=>$value){
    $content_html = getwebcontent($article['link'][$key]);
    preg_match("/
    [\s|\S]*?/",$content_html,$matches);
    $article[content][$key] = $matches[0];

    }
    //不转码还真不能保存成文件
    foreach($article[title] as $key=>$value){
    $article[title][$key] = iconv('utf-8', 'gbk', $value);//转码
    }
    //存入文件
    $num = count($article['title']);
    for($i=0; $ifile_put_contents("{$article[title][$i]}.txt", $article['content'][$i]);
    }
    ?>
  • Related labels:
    php
    source:php.cn
    Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    Popular Tutorials
    More>
    Latest Downloads
    More>
    Web Effects
    Website Source Code
    Website Materials
    Front End Template