Home > Backend Development > PHP Tutorial > Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Linda Hamilton
Release: 2024-12-23 05:28:14
Original
627 people have browsed it

Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Failed to Encode UTF-8 with PHP DOMDocument::loadHTML

In certain scenarios, attempting to parse HTML using DOMDocument::loadHTML can result in encoding issues, particularly when UTF-8 encoding is involved. This article explores the reasons behind these problems and provides several solutions to address them effectively.

Cause of the Issue

By default, DOMDocument treats strings as encoded in ISO-8859-1, which is the HTTP/1.1 default character set. However, UTF-8 strings are interpreted incorrectly under this assumption, leading to encoding errors.

Alternative Solutions

1. Prepending Encoding Declarations

For straightforward (X)HTML snippets, prepend an XML or meta charset declaration to instruct the parser to treat the string as UTF-8:

$contentType = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">';
$dom->loadHTML($contentType . $profile);

$dom->loadHTML('<meta charset="utf8">' . $profile);
Copy after login

2. Using HTML SmartDOMDocument

This workaround can be applied if prior encoding declarations cannot be determined:

$dom->loadHTML(mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8'));
Copy after login

3. PHP 8.2 Workaround

For PHP 8.2 , use the following approach:

$dom->loadHTML(mb_encode_numericentity($profile, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));
Copy after login

Conclusion

By understanding the cause of encoding problems and employing the appropriate solutions, developers can effectively parse HTML with UTF-8 encoding using PHP's DOMDocument::loadHTML method.

The above is the detailed content of Why Does PHP DOMDocument::loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template