PHP DOMDocument loadHTML Not Encoding UTF-8 Correctly
Problem:
When parsing HTML with PHP's DOMDocument::loadHTML(), UTF-8 characters are not interpreted correctly, leading to distorted output.
Cause:
DOMDocument assumes the input string is in ISO-8859-1 encoding by default. However, UTF-8 is commonly used in HTML5. When loading UTF-8 strings without specifying the encoding, DOMDocument misinterprets them.
Solution:
To address this issue, you need to specify the correct encoding for the input string. You have several options:
Example:
This code demonstrates using the mb_encode_numericentity() function:
$profile = '<p>イリノイ州シカゴにて、アイルランド系の家庭に</p>'; $dom = new DOMDocument(); $dom->loadHTML(mb_encode_numericentity($profile, [0x80, 0x10FFFF, 0, ~0], 'UTF-8')); echo $dom->saveHTML();
By using these techniques, you can ensure that UTF-8 characters are parsed and displayed correctly in your PHP DOMDocument.
The above is the detailed content of Why is my PHP DOMDocument::loadHTML() not handling UTF-8 correctly?. For more information, please follow other related articles on the PHP Chinese website!