©
This document uses PHP Chinese website manual Release
(PHP 4, PHP 5)
xml_parser_create — 建立一个 XML 解析器
$encoding
] )函数 xml_parser_create() 建立一个新的 XML 解析器并返回可被其它 XML 函数使用的资源句柄。
可选参数 encoding
在 PHP 4 中用来指定要被解析的 XML 输入的字符编码方式。PHP 5 开始,自动侦测输入的 XML 的编码,因此 encoding
参数仅用来指定解析后输出数据的编码。在 PHP 4 总,默认输出的编码与输入数据的编码是相同的。如果传递了空字符串,解析器会尝试搜索头 3 或 4 个字节以确定文档的编码。在 PHP 5.0.0 和 5.0.1 总,默认输出的字符编码是 ISO-8859-1,而 PHP 5.0.2 及以上版本是 UTF-8。解析器支持的编码有 ISO-8859-1, UTF-8 和 US-ASCII。
请参阅函数 xml_parser_create_ns() 和 xml_parser_free() 。
[#1] marek995 at seznam dot cz [2010-09-24 11:46:17]
I created a function, which combines xml_paresr_create and all functions around.
<?php
function html_parse($file)
{
$array = str_split($file, 1);
$count = false;
$text = "";
$end = false;
foreach($array as $temp)
{
switch($temp)
{
case "<":
between($text);
$text = "";
$count = true;
$end = false;
break;
case ">":
if($end == true) {end_tag($text);}
else {start_tag($text);}
$text = "";
break;
case "/":
if($count == true) {$end = true;}
else {$text = $text . "/";}
break;
default:
$count = false;
$text = $text . $temp;
}
}
}
?>
The input value is a string.
It calls functions start_tag() , between() and end_tag() just like the original xml parser.
But it has a few differences:
- It does NOT check the code. Just resends values to that three functions, no matter, if they are right
- It works with parameters. For example: from tag <sth b="42"> sends sth b="42"
- It works wit diacritics. The original parser sometimes wrapped the text before the first diacritics appearance.
- Works with all encoding. If the input is UTF-8, the output will be UTF-8 too
- It works with strings. Not with file pointers.
- No "Reserved XML name" error
- No doctype needed
- It does not work with commentaries, notes, programming instructions etc. Just the tags
definition of the handling functions is:
<?php
function between($stuff) {}
?>
No other attributes
[#2] juanhdv at NOSPAM dot divvol dot org [2007-12-05 12:08:09]
In PHP 5, when including in your xml file the definition '
<?phpxml version="1.0" encoding="ISO-8859-1" ?>
', I'd also recommend adding the option below:
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING, "ISO-8859-1").
It works fine!
If your enconding is 'UTF-8', just replace 'ISO-8859-1'.
[#3] [2006-04-19 07:42:51]
I'd also recommend adding the option below
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
[#4] Tobbe [2005-05-25 06:01:32]
The above "XML to array" code does not work properly if you have several tags on the same level and with the same name, example:
<currenterrors>
<error>
<description>This is a real error...</description>
</error>
<error>
<description>This is a second error...</description>
</error>
<error>
<description>Lots of errors today...</description>
</error>
<error>
<description>This is the last error...</description>
</error>
</currenterrors>
It will then only display the first <error>-tag.
In this case you will need to number the tags automatically or maybe have several arrays for each new element.
[#5] php at stock-consulting dot com [2005-02-21 02:47:25]
Even though I passed "UTF-8" as encoding type PHP (Version 4.3.3) did *not* treat the input file as UTF-8. The input file was missing the BOM header bytes (which may indeed be omitted, according to RFC3629...but things are a bit unclear there. The RFC seems to make mere recommendations concering the BOM header). If you want to sure that PHP treats an UTF-8 encoded file correctly, make sure that it begins with the corresponding 3 byte BOM header (0xEF 0xBB 0xBF)
[#6] jcalvert at gmx dot net [2004-04-03 10:39:43]
To maintain compatibility between PHP4 and PHP5 you should always pass a string argument to this function. PHP4 autodetects the format of the input if you leave it out whereas PHP5 will assume the format to be ISO-8859-1 (and choke on the byte order marker of UTF-8 files).
Calling the function as
<?php $res = xml_parser_create('') ?>
will cause both versions of PHP to autodetect the format.