Home > Backend Development > PHP Tutorial > Why Does PHP DOMDocument's loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Why Does PHP DOMDocument's loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

Barbara Streisand
Release: 2024-12-30 16:48:09
Original
833 people have browsed it

Why Does PHP DOMDocument's loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?

PHP DOMDocument loadHTML Cannot Encode UTF-8 Correctly

DOMDocument's loadHTML method assumes your input is encoded in ISO-8859-1, which can lead to incorrect encoding of UTF-8 characters.

The underlying parser used by DOMDocument expects HTML4 input, potentially causing challenges with HTML5 documents.

Solution:

To resolve this issue, specify the character encoding of your HTML using one of the following methods:

XML Encoding Declaration:

ContentType Header:

XML Encoding Prefix:

Workaround for Unknown HTML Content:

If you cannot make assumptions about the encoding, employ a workaround like SmartDOMDocument or the following PHP code:

$profile = '<p>イリノイ州シカゴにて、アイルランド系の家庭に、9</p>';
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8'));
echo $dom->saveHTML();
Copy after login

Caution for PHP 8.2 :

In PHP 8.2 , the mb_convert_encoding function will generate a deprecation warning. As an alternative:

$dom->loadHTML(mb_encode_numericentity($profile, [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));
Copy after login

While not ideal, this method ensures safe encoding as all characters can be represented in ISO-8859-1.

The above is the detailed content of Why Does PHP DOMDocument's loadHTML Fail with UTF-8 Encoding, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template