Home > Backend Development > PHP Tutorial > Take you ten minutes to understand the process of implementing a crawler in PHP

Take you ten minutes to understand the process of implementing a crawler in PHP

烟雨青岚
Release: 2023-04-09 08:36:01
forward
3770 people have browsed it

Take you ten minutes to understand the process of implementing a crawler in PHP

Text information

We try to get the table information. Here, we use the class schedule of a certain school instead. :

Take you ten minutes to understand the process of implementing a crawler in PHP

## Next we will add the code:

a.php

 <?php  header( "Content-type:text/html;Charset=utf-8" ); 
$ch = curl_init();        $url ="表的链接";
        curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
        curl_setopt($ch,CURLOPT_URL,$url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        $content=curl_exec($ch);
        preg_match_all("/<td rowspan=\"\d\">(.*?)<\/td>\n<td rowspan=\"\d\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td>(.*?)<\/td>\n<td>(.*?)<\/td><td>(.*?)<\/td>/",$content,$matchs,PREG_SET_ORDER);//匹配该表所用的正则
        var_dump($matchs);
Copy after login

Then let’s run it:

Take you ten minutes to understand the process of implementing a crawler in PHP

The class schedule was successfully obtained;

Picture acquisition

Absolute link

We take the homepage of Baidu Gallery as an example


Take you ten minutes to understand the process of implementing a crawler in PHPb.php

  <?php  header( "Content-type:text/html;Charset=utf-8" );  


    $ch = curl_init();    $url="http://image.baidu.com/";
    curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    $content=curl_exec($ch);    $string=file_get_contents($url); 
    preg_match_all("/<img ([^ alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >]*)\s*src=(&#39;|\")([^&#39;\"]+)(&#39;|\")/", 
                    $string,$matches);    $new_arr=array_unique($matches[3]);     foreach($new_arr as $key){ 
        echo "<img  src=$key alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >";
     }
Copy after login

Then, we get the following page:
Take you ten minutes to understand the process of implementing a crawler in PHP

Relative links

Most of the links to pictures in Baidu Gallery are absolute links, so when we encounter web page pictures that are relative links time, how should we deal with it? In fact, it is very simple. We only need to change the loop part to


Take you ten minutes to understand the process of implementing a crawler in PHP

Then we can also output the image in the browser;

Thank you for reading, I hope Everyone benefits.

Recommended tutorial: "

php tutorial"

The above is the detailed content of Take you ten minutes to understand the process of implementing a crawler in PHP. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:csdn.net
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Issues
php data acquisition?
From 1970-01-01 08:00:00
0
0
0
PHP extension intl
From 1970-01-01 08:00:00
0
0
0
How to learn php well
From 1970-01-01 08:00:00
0
0
0
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template