file_get_contents() Distorts UTF-8 Characters: A Resolution
When loading HTML from external sources with UTF-8 encoding, file_get_contents() may corrupt characters, resulting in incorrect representation of特殊字符. To address this issue:
Examine Encoding Settings:
Ensure that the remote server is serving the HTML in the correct UTF-8 encoding. Check the Content-Type header to confirm the encoding declared by the server.
Apply Encoding to Native PHP Functions:
In some cases, manually specifying the encoding in PHP functions can resolve the issue. Use the mb_detect_encoding() function to identify the encoding of the returned content and then use mb_convert_encoding() or iconv() to convert it to the desired encoding (e.g., UTF-8).
$html = mb_convert_encoding($html, 'UTF-8', mb_detect_encoding($html, 'UTF-8', true));
Consider HTML Entities:
If the characters are still being distorted, consider converting them to HTML entities. This can be done using htmlentities().
$html = htmlentities($html, ENT_QUOTES, 'UTF-8');
Example:
The following example demonstrates how to load HTML with UTF-8 characters and convert them to HTML entities:
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Test</title> </head> <body> <?php $html = file_get_contents('http://example.com'); echo htmlentities($html); ?> </body> </html>
The above is the detailed content of Why Does `file_get_contents()` Garble UTF-8 Characters, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!