Home Backend Development PHP Tutorial Introduction to several methods of crawling pages with php curl_PHP Tutorial

Introduction to several methods of crawling pages with php curl_PHP Tutorial

Jul 20, 2016 am 11:11 AM
curl php host introduce to use several kinds Can us crawl data method page

Curl mainly captures data. Of course, we can use other methods to capture it, such as fsockopen, file_get_contents, etc. But it can only capture those pages that can be directly accessed. If you want to capture pages with page access control, or pages after logging in, it will be more difficult.

is to retrieve the PHP homepage and put it into a file.

Example 1. Use PHP's CURL module to retrieve the PHP homepage

 代码如下 复制代码
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://localhost/mytest/phpinfo.php");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //如果把这行注释掉的话,就会直接输出
$result=curl_exec($ch);
curl_close($ch);


2. Use a proxy to crawl

Why use a proxy to crawl Woolen cloth? Take Google as an example. If you capture Google's data very frequently in a short period of time, you won't be able to capture it. When Google restricts your IP address, you can change the proxy and crawl again.

The code is as follows Copy code
代码如下 复制代码
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.hzhuti.com");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080);
//url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密码的话,加上这个
$result=curl_exec($ch);
curl_close($ch);
?>
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://www.hzhuti.com");

curl_setopt($ch, CURLOPT_HEADER, false);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
 代码如下 复制代码

$ch = curl_init();
/*在这里需要注意的是,要提交的数据不能是二维数组或者更高
*例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010')
 *例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')这样会报错的*/
 $data = array('name' => 'test', 'sex'=>1,'birth'=>'20101010');
 curl_setopt($ch, CURLOPT_URL, 'http://localhost/mytest/curl/upload.php');
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
 curl_exec($ch);
 ?>在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php输出的内容Array ( [name] => test [sex] => 1 [birth] => 20101010 )

curl_setopt($ch, CURLOPT_PROXY , 125.21 .23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password'); If you want a password, add this $result=curl_exec($ch); curl_close( $ch); ?> 3. After posting the data, grab the data Let’s talk about the data submission separately, because curl is used In many cases, there will be data interaction, so it is more important.
The code is as follows Copy code
$ch = curl_init();<🎜> /*It should be noted here that the data to be submitted cannot be a two-dimensional array or higher<🎜 > *For example array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') *For example array( 'name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010') This will report an error*/ $data = array(' name' => 'test', 'sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL, 'http://localhost/mytest/curl/upload .php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_exec($ch); ?>In upload. In the php file, print_r($_POST); can use curl to grab the content Array output by upload.php ( [name] => test [sex] => 1 [birth] => 20101010 )

4. Grab some pages with page access control

3 methods of page access control

3 methods of page access control Zhang Published on 2010-10-12

Category: apache/nginx
We often see this phenomenon, see the picture below


apache page access control
Why should we carry out such control? Let different people see different things and protect information. Although this kind of protection is relatively low-level, it is still somewhat useful.

1. Use the htpasswd command to generate a permission control file

The code is as follows Copy code
 代码如下 复制代码

查看复制打印?
1.[zhangy@BlackGhost test]$ htpasswd -c ./access tank  //生成一个密码文件 ,-c是新建一个文件  htpasswd -h可查看  
2.New password:            //提示输入密码  
3.Re-type new password:        //重复密码  
4.Adding password for user tank  
5.[zhangy@BlackGhost test]$ cat access    //查看一下密码文件  
6.tank:Uj5B3qIF/BNdI      //用户名是明文的,密码是加密的。 
[zhangy@BlackGhost test]$ htpasswd -c ./access tank  //生成一个密码文件 ,-c是新建一个文件  htpasswd -h可查看
New password:            //提示输入密码
Re-type new password:        //重复密码
Adding password for user tank
[zhangy@BlackGhost test]$ cat access    //查看一下密码文件
tank:Uj5B3qIF/BNdI      //用户名是明文的,密码是加密的。到这儿密码文件是生成好了。

View copy and print?

1.[zhangy@BlackGhost test]$ htpasswd -c ./access tank / /Generate a password file, -c is to create a new file htpasswd -h can be viewed

2.New password: Adding password for user tank

5.[zhangy@BlackGhost test]$ cat access //Check the password file

6.tank:Uj5B3qIF/BNdI //The username is in clear text and the password is encrypted.

[zhangy@BlackGhost test]$ htpasswd -c ./access tank //Generate a password file, -c is to create a new file htpasswd -h can be viewed

New password: //Prompt for password
 代码如下 复制代码

listen 10004
NameVirtualHost *:10004

 DocumentRoot "/home/zhangy/www/test"
 ServerName *:10004
 BandwidthModule On
 ForceBandWidthModule On
 Bandwidth all 1024000
 MinBandwidth all 50000
 LargeFileLimit * 500 50000
 MaxConnection all 2

 ErrorLog "/home/zhangy/apache/blog.51yip.com.com-error.log"
 CustomLog "/home/zhangy/apache/blog.51yip.com-access.log" common
//看一下,下面的配置
 
 AuthType Basic
 AuthName "access test"
 AuthUserFile /home/zhangy/www/test/access
 Require valid-user
 

Re -type new password: //Repeat password

Adding password for user tank

[zhangy@BlackGhost test]$ cat access //Check the password file

tank:Uj5B3qIF/BNdI //The user name is in clear text , the password is encrypted. At this point the password file is generated.

 代码如下 复制代码

[zhangy@BlackGhost test]$ vi .htaccess //打开个文件 ,添加权限内容
[zhangy@BlackGhost test]$ cat .htaccess //下面就是.htaccess的内容
 AuthType Basic
 AuthName "access test"
 AuthUserFile /home/zhangy/www/test/access
 Require valid-user

Second, page access control method 1, which can be configured by modifying httpd.conf or httpd-vhosts.conf
The code is as follows Copy code
listen 10004NameVirtualHost *:10004 DocumentRoot "/home/zhangy/www/test" ServerName *:10004 BandwidthModule On ForceBandWidthModule On Bandwidth all 1024000 MinBandwidth all 50000 LargeFileLimit * 500 50000 MaxConnection all 2 ErrorLog "/home/ zhangy/apache/blog.51yip.com.com-error.log" CustomLog "/home/zhangy/apache/blog.51yip.com-access.log" common//Look at the following configuration AuthType Basic AuthName "access test" AuthUserFile /home/zhangy/www/test/access Require valid-user
2, we can use .htaccess file to control Create a .htaccess file under the root directory of test
The code is as follows Copy code
[zhangy@BlackGhost test]$ vi .htaccess   //Open a file and add permission content [zhangy@BlackGhost test]$ cat .htaccess  //The following is the content of .htaccess AuthType Basic AuthName "access test" AuthUserFile /home/zhangy/www/test/access Require valid-user

3,不用密码文件,也可以进行访问控制

 代码如下 复制代码

define('ADMIN_USERNAME','tank'); // Admin Username
define('ADMIN_PASSWORD','tank'); // Admin Password

//log check
if (!isset($_SERVER['PHP_AUTH_USER']) || !isset($_SERVER['PHP_AUTH_PW']) ||
 $_SERVER['PHP_AUTH_USER'] != ADMIN_USERNAME ||$_SERVER['PHP_AUTH_PW'] != ADMIN_PASSWORD) {
 Header("WWW-Authenticate: Basic realm="access test"");
 Header("HTTP/1.0 401 Unauthorized");

 echo <<<EOB
 <html><body>
 <h1>Rejected!</h1>
 <big>Wrong Username or Password!</big>
 </body></html>
EOB;
 exit;
}

curl相关函数列表:

curl_init — 初始化一个CURL会话
curl_setopt — 为CURL调用设置一个选项
curl_exec — 执行一个CURL会话
curl_close — 关闭一个CURL会话
curl_version — 返回当前CURL版本
curl_init — 初始化一个CURL会话
描述
int curl_init ([string url])
curl_init()函数将初始化一个新的会话,返回一个CURL句柄供 curl_setopt(), curl_exec(),和 curl_close() 函数使用。如果可选参数被提供,那么CURLOPT_URL选项将被设置成这个参数的值。你可以使用curl_setopt()函数人工设置。

例 1. 初始化一个新的CURL会话,且取回一个网页

 代码如下 复制代码

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, “http://www.zend.com/”);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_exec ($ch);
curl_close ($ch);
?>


www.bkjia.comtruehttp://www.bkjia.com/PHPjc/444653.htmlTechArticlecurl主要是抓取数据,当然我们可以用其他的方法来抓取,比如fsockopen,file_get_contents等。但是只能抓那些能直接访问的页面,如果要抓取有页...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

7 PHP Functions I Regret I Didn't Know Before 7 PHP Functions I Regret I Didn't Know Before Nov 13, 2024 am 09:42 AM

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

How To Set Up Visual Studio Code (VS Code) for PHP Development How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

PHP Program to Count Vowels in a String PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

How do you parse and process HTML/XML in PHP? How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

See all articles