$ch = curl_init(); $c_url = 'http://www.baidu.com'; $c_url_data = "product_&type=".$type.""; curl_setopt($ch, CURLOPT_URL,$c_url); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $c_url_data); echo $result = curl_exec($ch); curl_close ($ch); unset($ch);
This article mainly explains the knowledge of php_curl library and teaches you how to better use php_curl.
Introduction
You may encounter this problem when writing PHP script code: How to get content from other sites? There are several solutions here; the simplest is to use the fopen() function in PHP, but the fopen function does not have enough parameters to use, such as when you want to build a "web crawler" and want to define the client description of the crawler (IE , firefox), obtain content through different request methods, such as POST, GET; etc. These requirements cannot be achieved with the fopen() function.
In order to solve the problem we raised above, we can use the PHP extension library-Curl. This extension library is usually included in the installation package by default. You can use it to obtain the content of other sites or do other things.
Note: These two pieces of code require the support of the php_curl extension library. Check phpinfo(). If curl support is enabled, it means the curl library is supported.
1. Enable curl library support for PHP under Windows:
Open php.ini and remove the ; sign before extension=php_curl.dll.
2. Enable curl library support for PHP under Linux:
Add –with-curl after ./configure when compiling PHP
In this article, let’s take a look at how to use the curl library and look at its other features useful, but next, we have to start with the most basic usage
Basic usage:
In the first step, we create a new curl session through the function curl_init(), the code is as follows:
<?php // create a new curl resource $ch = curl_init(); ?>
We have successfully created a curl Session, if you need to obtain the content of a URL, then the next step is to pass a URL to the curl_setopt() function, code:
<?php // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.com/”); ?>
After completing the previous step, the preparation work of curl is completed, and curl will obtain the URL site content and print it out. Code:
<?php // grab URL and pass it to the browser curl_exec($ch); ?>
Finally, close the current curl session
<?php //close curl resource, and free up system resources curl_close($ch); ?>
Let’s take a look at the completed example code:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.nl/”); // grab URL and pass it to the browser curl_exec($ch); // close curl resource, and free up system resources curl_close($ch); ?>
We have just obtained the content of another site and automatically output it to the browser. We Is there any other way to organize the information obtained and then control the content of its output? There is no problem at all. In the parameters of the curl_setopt() function, if you want to get the content but not output it, use the CURLOPT_RETURNTRANSFER parameter and set it to a non-0 value/true!. For the complete code, please see:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.nl/”); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // grab URL, and return output $output = curl_exec($ch); // close curl resource, and free up system resources curl_close($ch); // Replace ‘Google' with ‘PHPit' $output = str_replace('Google', ‘PHPit', $output); // Print output echo $output; ?>
In the above 2 examples , you may notice that by setting different parameters of the function curl_setopt(), you can obtain different results. This is why curl is powerful. Let’s take a look at the meaning of these parameters.
CURL related options:
If you have read the curl_setopt() function in the PHP manual, you can notice that there is a long list of parameters below it. It is impossible for us to introduce them one by one. For more information, please check the PHP manual, here Only some commonly used and some parameters are introduced.
The first interesting parameter is CURLOPT_FOLLOWLOCATION. When you set this parameter to true, curl will get the redirection path deeper based on any redirection command. For example: when you try to get a PHP page, then this There is a jump code in the PHP page. Curl will get the content from http://new_url instead of returning the jump code. The complete code is as follows:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.com/”); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // grab URL, and print curl_exec($ch); ?>
If Google sends a redirect request, the above example will continue to obtain content based on the redirected URL. The two options related to this parameter are CURLOPT_MAXREDIRS and CURLOPT_AUTOREFERER.
The parameter CURLOPT_MAXREDIRS option allows you to define the jump The maximum number of requests, after which the content will no longer be obtained. If CURLOPT_AUTOREFERER is set to true, curl will automatically add the Referer header to each jump link. It may not be very important, but it is very useful in certain cases.
The next parameter introduced is CURLOPT_POST, this is a very useful feature because it allows you to do a POST request instead of a GET request, which essentially means you can submit
other form pages without actually having to do it in the form Fill in. The example below shows what I mean:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL,”http://projects/phpit/content/using%20curl%20php/demos/handle_form.php”); // Do a POST $data = array('name' => ‘Dennis', 'surname' => ‘Pallett'); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); // grab URL, and print curl_exec($ch); ?> And the handle_form.php file: echo ‘Form variables I received:'; echo ‘'; print_r ($_POST); echo ‘'; ?>
As you can see, this makes it really easy to submit forms, and it's a great way to test all your forms without having to fill them all the time.
The parameter CURLOPT_CONNECTTIMEOUT is usually used to set the time when curl tries to request a link. This is a very important option. If you set this time too short, it may cause the curl request to fail.
But if you set it for too long, the PHP script may die. An option related to this parameter is CURLOPT_TIMEOUT, which is used to set the time required for curl to be allowed to execute. If you set this to a very small value, it may cause the downloaded pages to be incomplete because they take a while to download.
The last option is CURLOPT_USERAGENT, which allows you to customize the client name of the request, such as webspilder or IE6.0. The sample code is as follows:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://sc.jb51.net/”); curl_setopt($ch, CURLOPT_USERAGENT, ‘My custom web spider/0.1′); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // grab URL, and print curl_exec($ch); ?>
现在我们把最有意思的一个参数都介绍过了,下面我们来介绍一个curl_getinfo() 函数,看看它能为我们做些什么。
获取页面的信息:
函数curl_getinfo()可以使得我们获取接受页面各种信息,你能编辑这些信息通过设定选项的第二个参数,你也可以传递一个数组的形式。就像下面的例子:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.com”); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FILETIME, true); // grab URL $output = curl_exec($ch); // Print info echo ‘'; print_r (curl_getinfo($ch)); echo ‘'; ?>
大部分返回的信息是请求本身的,像:这个请求花的时间,返回的头文件信息,当然也有一些页面的信息,像页面内容的大小,最后修改的时间。
那些全是关于curl_getinfo()函数的,现在让我们看看它的实际用途。
实际用途:
curl库的第一用途可以查看一个URL页面是否存在,我们可以通过查看这个URL的请求返回的代码来判断比如404代表这个页面不存在,我们来看一些例子:
// create a new curl resource $ch = curl_init(); // set URL and other appropriate options curl_setopt($ch, CURLOPT_URL, “http://www.google.com/does/not/exist”); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // grab URL $output = curl_exec($ch); // Get response code $response_code = curl_getinfo($ch, CURLINFO_HTTP_CODE); // Not found? if ($response_code == ‘404′) { echo ‘Page doesn\'t exist'; } else { echo $output; } ?>
其他的用户可能是创建一个自动检查器,验证每个请求的页面是否存在。
我们可以用curl库来写和google类似的网页蜘蛛(web spider),或是其他的网页蜘蛛。这篇文章不是关于如何写一个网页蜘蛛的,因此所以我们没有讲任何关于网页蜘蛛的细节问题,但是以后在PHPit 将会介绍用 curl来构造一个web spider.
结论:
在这篇文章我已经表明,如何使用php中的curl库和其大部分的选项。
为最基本的任务,只想获得一个网页,你可能不会需要CURL库,但是,一旦你想要做任何事情稍微先进的,您可能会想要使用curl库。
在近未来,我会告诉您究竟如何建立自己的网络蜘蛛,类似Google的网络蜘蛛,敬请期待,以phpit。
更多在PHP中使用curl_init函数的说明相关文章请关注PHP中文网!