Home Backend Development PHP Tutorial PHP simple_html_dom.php+regular collection of article code_PHP tutorial

PHP simple_html_dom.php+regular collection of article code_PHP tutorial

Jul 21, 2016 pm 03:41 PM
html php simple code Include copy article regular collection

Copy code The code is as follows:

//Include PHP Simple html Dom class library file
include_once ('./simplehtmldom/simple_html_dom.php');

//Collect html
function getwebcontent($url){
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION , 1);
$contents = trim(curl_exec($ch));
curl_close($ch);
return $contents;
}


// Get the title and url
$string =
getwebcontent('http://www.babytree.com/learn/zhunbeihuaiyun/jijibeiyun/2');
//Regular matching
  • Get the title and Address
    preg_match_all ("/
  • (.*)/",
    $string, $out , PREG_SET_ORDER);

    foreach($out as $key => $value){
    $article['title'][] = $out[$key][2];
    $article['link'][] = "http://www.babytree.com/learn/article/".$out[$key][1];
    }

    //According to url gets article content
    foreach($article['link'] as $key=>$value){
    $html = file_get_html($value);
    $div = $html->find ('div[id=pagenum_0]');
    $article[content][] = $div[0]->innertext;
    }
    //Title transcoding---really used This step is not necessary at this time - because we have to use utf8 in the first place
    //It really can’t be saved as a file without transcoding
    foreach($article[title] as $key=>$value){
    $article[title][$key] = iconv('utf-8', 'gbk', $value);//Transcoding
    }
    //Save to file
    $num = count ($article['title']);
    for($i=0; $i<$num; $i++){
    file_put_contents("{$article[title][$i]}.txt" , $article['content'][$i]);
    }

    /*I originally wanted to post it before 12 o'clock. . But look at it, it’s already 3:30. . . Even if it was yesterday
    Originally, using regular expressions is the best and fastest way to obtain article content.
    However, regular expressions are good, but regular expressions are really difficult! So I did a little research and found
    Many people on the Internet are also using PHP Simple Dom. Although the efficiency is a bit slower, the effect is still good
    It takes about 7/8 from including the class library file to writing the txt file. Seconds can be used for further optimization, especially the regular rules for obtaining article content. That is so disgusting
    You can do some research*/
    ?>
  • www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/321084.htmlTechArticleCopy the code as follows: ?php //Include PHP Simple html Dom class library file include_once('./simplehtmldom/ simple_html_dom.php'); //Collect html function getwebcontent($url){ $ch = curl_...
    Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    AI Hentai Generator

    AI Hentai Generator

    Generate AI Hentai for free.

    Hot Article

    R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
    2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
    Repo: How To Revive Teammates
    1 months ago By 尊渡假赌尊渡假赌尊渡假赌
    Hello Kitty Island Adventure: How To Get Giant Seeds
    1 months ago By 尊渡假赌尊渡假赌尊渡假赌

    Hot Tools

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment

    Dreamweaver CS6

    Dreamweaver CS6

    Visual web development tools

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    CakePHP Project Configuration CakePHP Project Configuration Sep 10, 2024 pm 05:25 PM

    In this chapter, we will understand the Environment Variables, General Configuration, Database Configuration and Email Configuration in CakePHP.

    PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

    PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

    CakePHP Date and Time CakePHP Date and Time Sep 10, 2024 pm 05:27 PM

    To work with date and time in cakephp4, we are going to make use of the available FrozenTime class.

    CakePHP File upload CakePHP File upload Sep 10, 2024 pm 05:27 PM

    To work on file upload we are going to use the form helper. Here, is an example for file upload.

    CakePHP Routing CakePHP Routing Sep 10, 2024 pm 05:25 PM

    In this chapter, we are going to learn the following topics related to routing ?

    Discuss CakePHP Discuss CakePHP Sep 10, 2024 pm 05:28 PM

    CakePHP is an open-source framework for PHP. It is intended to make developing, deploying and maintaining applications much easier. CakePHP is based on a MVC-like architecture that is both powerful and easy to grasp. Models, Views, and Controllers gu

    HTML Table Layout HTML Table Layout Sep 04, 2024 pm 04:54 PM

    Guide to HTML Table Layout. Here we discuss the Values of HTML Table Layout along with the examples and outputs n detail.

    CakePHP Creating Validators CakePHP Creating Validators Sep 10, 2024 pm 05:26 PM

    Validator can be created by adding the following two lines in the controller.

    See all articles