File_get_contents() Breaks XML Formatting in HTML
When using file_get_contents() to retrieve content from a remote HTML document, some special characters may malfunction. This occurs primarily with content encoded in UTF-8 that involves characters such as Ľ, Š, Č, Ť, Ž, and others. Instead of rendering properly, these characters display corrupted versions such as Å, ¾, ¤, and similar nonsensical symbols.
Solution:
To resolve this issue, convert the retrieved content to HTML entities using the mb_convert_encoding() function. Here's the modified code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
By converting the UTF-8 characters to their corresponding HTML entities, we ensure proper rendering of special characters in the loaded HTML document.
The above is the detailed content of Why Does file_get_contents() Corrupt Special Characters in Remote HTML, and How Can I Fix It?. For more information, please follow other related articles on the PHP Chinese website!