Community

Learn

Tools Library

AI Tools

Leisure

English

Home > Backend Development > PHP Tutorial > php simple dom html parses garbled characters

php simple dom html parses garbled characters

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2016-08-08 09:28:17

Original

1299 people have browsed it

1. 乱码解决

毫无疑问, 一上来就碰到了乱码问题, 固然我已按文档所述, 所有的字符使用 UTF-8 编码:

$html = '<p>你好</p>'; $dom = new DOMDocument(); @$dom->loadHTML($html); echo $dom->documentElement->nodeValue;

Copy after login

可是, 若是改成:

$html = '<p>你好</p>'; $dom = new DOMDocument(); @$dom->loadXML($html); echo $dom->documentElement->nodeValue;

Copy after login

就没有问题. 后来才发现, 本来 loadHTML 会依靠 HTML 中的声明 meta 标签. 假如没有这样的标签, 就看成 iso-8859-1 字符集, 所以乱码. 要解决, 就给字符串加上如许的一个标签在头部:

$meta = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>'; @$dom->loadHTML($meta . $html);

Copy after login

2. 递归

HTML/XML 是递归布局, 所以必然会递归遍历:

function _pretty_html_node($node){ // 递归终止前提 // 1. XML_TEXT_NODE // 2. XML_ELEMENT_NODE // 3. 没有子节点 foreach($node->childNodes as $n){ $child_text .= _pretty_html_node($n); } // 然后对分歧的标签做不同的处置 switch($tag){ case 'a': $href = $node->getAttribute('href'); $text .= "<a href=\"$href\">$child_text</a>"; ... } return $text; }

Copy after login

3. 转义字符处置惩罚

对文本节点, 其 nodeValue 要颠末 htmlspeciachars() 转义. 由于读取 HTML/XML 时, 会对文本进行反转义, 比如 > 在内存中已经是 >了.

下载源码:pretty_html.php

Related posts:

C# 版的 SimpleXML
自架设Apache办事器过程当中的网页乱码问题
if-else对优化代码冗余度的反感化
Wordpress分页代码
用Javascript生成弹出窗口

以上就介绍了php simple dom html 解析乱码，包括了方面的内容，希望对PHP教程有兴趣的朋友有所帮助。

Related labels：

dom gt html node quot

Previous article：nginx source code (2) running Next article：Load balancing practice of nginx as a php site

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference

2025-02-26 03:58:14
I Combined the Blockchain and AI to Generate Art. Here’s What Happened Next.

2025-02-26 03:38:10
Advanced Prompt Engineering: Chain of Thought (CoT)

2025-02-26 03:17:10
Retrieval Augmented Generation in SQLite

2025-02-26 02:49:09
How to Use an LLM-Powered Boilerplate for Building Your Own Node.js API

2025-02-26 01:08:13
LLMs for Coding in 2024: Price, Performance, and the Battle for the Best

2025-02-26 00:46:10
Prompting Vision Language Models

2025-02-25 23:42:08
How to Measure the Reliability of a Large Language Model's Response

2025-02-25 22:50:13
An Illusion of Life

2025-02-25 21:54:11
Scientists Go Serious About Large Language Models Mirroring Human Thinking

2025-02-25 20:45:11

Latest Issues

How does it become <html></html> after entering <html><>?

From 1970-01-01 08:00:00

0

0

0

angular.js - Does anyone know how to bind html using ng-bind-html. But the html is escaped

From 1970-01-01 08:00:00

0

0

0

html translation environment

From 1970-01-01 08:00:00

0

0

0

How to send HTML content to another HTML page

From 1970-01-01 08:00:00

0

0

0

Do you need to create a corresponding HTML folder for each HTML under the view folder?

From 1970-01-01 08:00:00

0

0

0

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template