Home Backend Development PHP Problem How to read the source code of the redirected web page in PHP

How to read the source code of the redirected web page in PHP

Mar 31, 2023 am 09:05 AM

PHP is a widely used server-side scripting language that helps developers create dynamic web applications. However, sometimes PHP developers need to read the source code of an external web page, which may be a jump link. In this article, we will learn how to use PHP to read the source code of a redirect link.

Note: In this article, we will assume that you are already familiar with the PHP language and have a basic understanding of HTML and HTTP protocols.

Step 1: Open the link using cURL

cURL is a library used to process URLs in PHP. In order to read the source code of the linked web page, we need to use cURL to open the link. The following is the basic code for using cURL to open a web page in PHP:

$url = 'http://www.example.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
Copy after login

In the above code, we first define the link address of the web page to be read, then create a cURL handle and set the access link option, sent a cURL request and obtained the response result. The result is saved in the $output variable.

Step 2: Handle jump links

In some cases, the link we open may be a jump link, which means it will redirect to another link. In order to obtain the source code of the redirected web page, we need to check the response header information to determine whether there is a Location header. If it exists, it means that this is a jump link, and the redirected link address is stored in Location. We need to use cURL to open this redirected link to obtain the source code.

The following is a code example:

$url = 'http://www.example.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

if ($info['http_code'] == 301 || $info['http_code'] == 302) {
    $url = $info['redirect_url'];
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);
}
Copy after login

In the above code, we added a curl_setopt option: CURLOPT_FOLLOWLOCATION. This option tells cURL to follow redirects and automatically open new links. Then, we obtain the response header information and determine whether there is redirection information. If it exists, we use the curl_init() function to create a new cURL handle, open the redirect link, and obtain the source code.

Step Three: Parse the Source Code

After obtaining the source code of the web page, we need to further parse it so that we can process the data. We can use PHP's built-in DOMDocument class to parse HTML documents.

The following is a code example:

$url = 'http://www.example.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

if ($info['http_code'] == 301 || $info['http_code'] == 302) {
    $url = $info['redirect_url'];
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);
}

$doc = new DOMDocument();
@$doc->loadHTML($output);
$elements = $doc->getElementsByTagName('html');
$title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
Copy after login

In the above code, we first create a DOMDocument object, and then call the loadHTML() function to pass in the obtained web page source code as a parameter. Next, we use the getElementsByTagName() function to get the specified element and the nodeValue attribute to get the text content of the element. In this example, we get the HTML element and title element.

Step 4: Process the data

Finally, we can process the obtained data and store or display it as needed. The following is a simple example:

$url = 'http://www.example.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

if ($info['http_code'] == 301 || $info['http_code'] == 302) {
    $url = $info['redirect_url'];
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);
}

$doc = new DOMDocument();
@$doc->loadHTML($output);
$title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
echo "源码标题是:" . $title . "\n";
echo "HTML源码是:" . $output;
Copy after login

In the above code, we first get the title of the web page, and then directly output the HTML source code.

Conclusion

In this article, we learned how to use PHP to read the source code of the redirected web page. By using cURL to open links, process jump links, parse HTML documents and process data, we can easily read the source code of the web page for jump links. This is a very useful skill when you need to use web crawlers, data analysis, data mining and other scenarios.

The above is the detailed content of How to read the source code of the redirected web page in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

OWASP Top 10 PHP: Describe and mitigate common vulnerabilities. OWASP Top 10 PHP: Describe and mitigate common vulnerabilities. Mar 26, 2025 pm 04:13 PM

The article discusses OWASP Top 10 vulnerabilities in PHP and mitigation strategies. Key issues include injection, broken authentication, and XSS, with recommended tools for monitoring and securing PHP applications.

PHP 8 JIT (Just-In-Time) Compilation: How it improves performance. PHP 8 JIT (Just-In-Time) Compilation: How it improves performance. Mar 25, 2025 am 10:37 AM

PHP 8's JIT compilation enhances performance by compiling frequently executed code into machine code, benefiting applications with heavy computations and reducing execution times.

PHP Encryption: Symmetric vs. asymmetric encryption. PHP Encryption: Symmetric vs. asymmetric encryption. Mar 25, 2025 pm 03:12 PM

The article discusses symmetric and asymmetric encryption in PHP, comparing their suitability, performance, and security differences. Symmetric encryption is faster and suited for bulk data, while asymmetric is used for secure key exchange.

PHP Secure File Uploads: Preventing file-related vulnerabilities. PHP Secure File Uploads: Preventing file-related vulnerabilities. Mar 26, 2025 pm 04:18 PM

The article discusses securing PHP file uploads to prevent vulnerabilities like code injection. It focuses on file type validation, secure storage, and error handling to enhance application security.

PHP Authentication & Authorization: Secure implementation. PHP Authentication & Authorization: Secure implementation. Mar 25, 2025 pm 03:06 PM

The article discusses implementing robust authentication and authorization in PHP to prevent unauthorized access, detailing best practices and recommending security-enhancing tools.

PHP CSRF Protection: How to prevent CSRF attacks. PHP CSRF Protection: How to prevent CSRF attacks. Mar 25, 2025 pm 03:05 PM

The article discusses strategies to prevent CSRF attacks in PHP, including using CSRF tokens, Same-Site cookies, and proper session management.

PHP Input Validation: Best practices. PHP Input Validation: Best practices. Mar 26, 2025 pm 04:17 PM

Article discusses best practices for PHP input validation to enhance security, focusing on techniques like using built-in functions, whitelist approach, and server-side validation.

PHP API Rate Limiting: Implementation strategies. PHP API Rate Limiting: Implementation strategies. Mar 26, 2025 pm 04:16 PM

The article discusses strategies for implementing API rate limiting in PHP, including algorithms like Token Bucket and Leaky Bucket, and using libraries like symfony/rate-limiter. It also covers monitoring, dynamically adjusting rate limits, and hand

See all articles