Home Backend Development PHP Tutorial php crawl webpage_PHP tutorial

php crawl webpage_PHP tutorial

Jul 13, 2016 am 10:28 AM
Web page

Using php to capture the content of the page is very useful in actual development. For example, it can be used as a simple content collector to extract part of the content of the web page, etc. The captured content can be obtained by filtering it through regular expressions. To find the content you want, the following are several commonly used methods to use php to crawl the content of web pages.
1.file_get_contents
PHP code

$url = "http://www.phpzixue.cn";
$contents = file_get_contents($url);
//如果出现中文乱码使用下面代码
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>
$url = "http://www.phpzixue.cn";
$contents = file_get_contents($url);
//If Chinese garbled characters appear, use the following code
$url = "http://www.phpzixue.cn";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用户检测的网页里需要增加下面两行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>
$handle = fopen ("http://www.phpzixue.cn", "rb");
$contents = "";
do {
$data = fread($handle, 1024);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while(true);
fclose ($handle);
echo $contents;
?>
2.curl
PHP code
$url = "http://www.phpzixue.cn";
$ch = curl_init(); $timeout = 5;

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //The following two lines need to be added to the webpage that requires user detection //curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY); //curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD); $contents = curl_exec($ch); curl_close($ch); echo $contents; ?>
3.fopen->fread->fclose PHP code
$handle = fopen ("http://www.phpzixue.cn", "rb"); $contents = ""; do { $data = fread($handle, 1024); if (strlen($data) == 0) {
break;
} $contents .= $data; } while(true); fclose ($handle); echo $contents; ?>
Note: 1. Use file_get_contents and fopen to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files. 2. To use curl, you must have space to enable curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:WINDOWSsystem32; install the curl extension under Linux. http://www.bkjia.com/PHPjc/802110.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/802110.htmlTechArticleUsing php to capture the content of the page is very useful in actual development, such as a simple content collection processor, extract part of the content from the web page, etc., and the captured content is processed through the regular...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to send web pages to desktop as shortcut in Edge browser? How to send web pages to desktop as shortcut in Edge browser? Mar 14, 2024 pm 05:22 PM

How to send web pages to desktop as shortcut in Edge browser?

Why can't the web page be opened? Why can't the web page be opened? Jun 26, 2023 am 11:49 AM

Why can't the web page be opened?

Develop web voting system using JavaScript Develop web voting system using JavaScript Aug 09, 2023 pm 01:30 PM

Develop web voting system using JavaScript

What to do if the web page cannot be accessed What to do if the web page cannot be accessed Sep 06, 2023 am 09:36 AM

What to do if the web page cannot be accessed

Possible reasons why the network connection is normal but the browser cannot access the web page Possible reasons why the network connection is normal but the browser cannot access the web page Feb 19, 2024 pm 03:45 PM

Possible reasons why the network connection is normal but the browser cannot access the web page

How to set up web page automatic refresh How to set up web page automatic refresh Oct 26, 2023 am 10:52 AM

How to set up web page automatic refresh

What to do if the webpage cannot be opened What to do if the webpage cannot be opened Feb 21, 2024 am 10:24 AM

What to do if the webpage cannot be opened

What should I do if the images on the webpage cannot be loaded? 6 solutions What should I do if the images on the webpage cannot be loaded? 6 solutions Mar 15, 2024 am 10:30 AM

What should I do if the images on the webpage cannot be loaded? 6 solutions

See all articles