Home > Backend Development > PHP Problem > How to crawl only the header of the web page in php

How to crawl only the header of the web page in php

藏色散人
Release: 2023-03-13 09:14:01
Original
2007 people have browsed it

How to use php to only grab the web page header: 1. Use the get_headers() function; 2. Use the http_response_header method; 3. Use the stream_get_meta_data() function; 4. Use php CURL to get the web page header.

How to crawl only the header of the web page in php

The operating environment of this article: windows7 system, PHP7.1 version, DELL G3 computer

How to crawl only the web page header with php?

php 4 ways to get web page header information

phpThere are many ways to get web page header information. As far as php language is concerned, I know There are 4 methods, one by one below.

Method 1: Use the get_headers() function

Recommended index: ★★★★★

The get_header method is the simplest and can be done with just two lines of code. As follows:

$thisurl = "http://www.lao8.org/";
print_r(get_headers($thisurl, 1));
Copy after login

The result obtained is:

Array
(
    [0] => HTTP/1.1 200 OK
    [Cache-Control] => max-age=86400
    [Content-Length] => 76102
    [Content-Type] => text/html
    [Content-Location] => http://www.lao8.org/index.html
    [Last-Modified] => Fri, 19 Jul 2013 03:52:30 GMT
    [Accept-Ranges] => bytes
    [ETag] => "50bc48643384ce1:5cb3"
    [Server] => Microsoft-IIS/6.0
    [X-Powered-By] => ASP.NET
    [Date] => Fri, 19 Jul 2013 09:06:39 GMT
    [Connection] => close
)
Copy after login

Method 2: Use http_response_header

Recommended index: ★★★

http_response_headerf method is also very simple, Only three lines:

$thisurl = "http://www.lao8.org";
$html = file_get_contents($thisurl ); 
print_r($http_response_header);
Copy after login

The result obtained is:

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Cache-Control: max-age=86400
    [2] => Content-Length: 76102
    [3] => Content-Type: text/html
    [4] => Content-Location: http://www.lao8.org/index.html
    [5] => Last-Modified: Fri, 19 Jul 2013 03:52:30 GMT
    [6] => Accept-Ranges: bytes
    [7] => ETag: "50bc48643384ce1:5cb3"
    [8] => Server: Microsoft-IIS/6.0
    [9] => X-Powered-By: ASP.NET
    [10] => Date: Fri, 19 Jul 2013 09:06:41 GMT
    [11] => Connection: close
)
Copy after login

Method 3: Use the stream_get_meta_data() function

Recommended index: ★★★

Use The stream_get_meta_data() code only requires three lines:

$thisurl = "http://www.lao8.org/";
$fp = fopen($thisurl, 'r'); 
print_r(stream_get_meta_data($fp));
Copy after login

The result obtained is:

Array
(
    [wrapper_data] => Array
        (
            [0] => HTTP/1.1 200 OK
            [1] => Cache-Control: max-age=86400
            [2] => Content-Length: 76102
            [3] => Content-Type: text/html
            [4] => Content-Location: http://www.lao8.org/index.html
            [5] => Last-Modified: Fri, 19 Jul 2013 03:52:30 GMT
            [6] => Accept-Ranges: bytes
            [7] => ETag: "50bc48643384ce1:5cb3"
            [8] => Server: Microsoft-IIS/6.0
            [9] => X-Powered-By: ASP.NET
            [10] => Date: Fri, 19 Jul 2013 09:06:41 GMT
            [11] => Connection: close
        )
    [wrapper_type] => http
    [stream_type] => tcp_socket
    [mode] => r+
    [unread_bytes] => 1086
    [seekable] => 
    [uri] => http://www.lao8.org/
    [timed_out] => 
    [blocked] => 1
    [eof] => 
)
Copy after login

The fourth method: Use PHP’s advanced function CURL() to obtain

Recommendation index: ★★★★

The above three methods can obtain general web page header information. If you want to obtain more detailed header information, such as whether GZip compression is enabled on the web page. At this time, you can use PHP's advanced function curl() to obtain it.

Use curl to get the header to detect GZip compression

Post the code first:

<?php
$szUrl = &#39;http://www.lao8.org/&#39;;
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $szUrl);
curl_setopt($curl, CURLOPT_HEADER, 1);  //输出header信息
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  //不显示网页内容
curl_setopt($curl, CURLOPT_ENCODING, &#39;&#39;); //允许执行gzip
$data=curl_exec($curl); 
if(!curl_errno($curl))
{
    $info = curl_getinfo($curl);
    $httpHeaderSize = $info[&#39;header_size&#39;];  //header字符串体积
    $pHeader = substr($data, 0, $httpHeaderSize); //获得header字符串
    $split   = array("rn", "n", "r");  //需要格式化header字符串
    $pHeader = str_replace($split, &#39;<br>&#39;, $pHeader); //使用<br>换行符格式化输出到网页上
    echo $pHeader;
}
?>
Copy after login

The output results are as follows:

HTTP/1.1 200 OK
Cache-Control: max-age=86400
Content-Length: 15189
Content-Type: text/html
Content-Encoding: gzip
Content-Location: http://www.lao8.org/index.html
Last-Modified: Fri, 19 Jul 2013 03:52:28 GMT
Accept-Ranges: bytes
ETag: "0268633384ce1:5cb3"
Vary: Accept-Encoding
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Fri, 19 Jul 2013 09:27:21 GMT
Copy after login

You can see that the header information obtained using curl has this line: Content-Encoding: gzip, and GZip compression is enabled on the web page.

Recommended learning: "PHP Video Tutorial"

The above is the detailed content of How to crawl only the header of the web page in php. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Issues
php data acquisition?
From 1970-01-01 08:00:00
0
0
0
PHP extension intl
From 1970-01-01 08:00:00
0
0
0
How to learn php well
From 1970-01-01 08:00:00
0
0
0
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template