


Frequently asked questions about getting started with XML (3)
How to load documents with foreign and special characters?
Documents can contain foreign characters, for example:
foreign characters (úóí?)
For example, foreign characters such as 磲 must be preceded by an escape sequence. Foreign characters can be UTF-8 encoded or specified with a different encoding, as shown below:
foreign characters (磲)
XML now loads correctly.
Other characters are reserved in XML and need to be handled differently. The following XML:
This & that
produces the following error:
No spaces are allowed here.
Line 0000001: This & that
Position 0000012: ----------^
Here & is part of the XML syntax structure. If it is only placed inside the XML data source, it cannot be interpreted as & . You need to replace special character sequences called "entities".
This & that
The following characters require corresponding entities:
< <
& &
delimiter and therefore generally cannot be used inside an attribute value. For example, the following will return an error:
The single quote here is used both as an attribute delimiter and within the attribute value itself. In order to correct this problem, you can change the attribute delimiter to double quotes:
Or you can escape the single quotes to the entity'
".
You can also handle special characters in element content by placing the text in a CDATA section. The following is correct: In this example, the XML Object Model Display the CDATA node as the child node of the xml node, it will return the string
This & that is just "text" content.
as nodeValue
How to use MSXML COM component in Visual Studio 6.0 C++?
In Visual C++. The easiest way to use MSXML COM components in 6.0 is to use the #import directive:
It defines all IXML* interfaces and interface IDs, so you can Use them in your application. The MSXML type library and header files (in English) are also available from INETSDK, as well as uuid.lib which contains class IIDs.
How to use HTML entities in XML?
The following XML contains HTML entities. :
Copyright ? 2000, Microsoft Inc, All rights reserved.
It generates the following error: Line: 1, Location: 23, Error code: 0xC00CE002
Copyright ? 2000, ...
This is because XML has only five built-in entities. Details about built-in entities. See How to load documents with foreign and special characters? To use HTML entities, they need to be defined with a DTD. For details on the DTD, see the W3C XML Recommendation (English). To use this DTD, please refer to it. Included in the DOCTYPE tag as follows: Copyright ? 2000, Microsoft Inc, All rights reserved.
To load it, you need to turn off the validateOnParse attribute of the IXMLDOMDocument interface. Try pasting it into the Validator test page. , turn off DTD validation, and then click Validate. Notice that the document loads and copyright characters appear in the DOM tree at the end of the validator page.
%HTMLENT;
%HTMLENT;
Property Behavior
Data Same as nodeValue
Text Repeatedly connect multiple TEXT and CDATA nodes in the specified subtree and return the combined result.
Note: Whitespace characters include new lines, tabs and spaces.
The nodeValue attribute usually returns the content in the original document, regardless of how the document was loaded and the current xml:space scope.
The text attribute connects all text in the specified subtree and expands the entity. This has to do with how the document is loaded, the current state of the PReserveWhiteSpace switch and the current xml:space scope, see below:
preserveWhiteSpace = true when the document is loaded
preserveWhiteSpace=true preserveWhiteSpace=true preserveWhiteSpace=false preserveWhiteSpace=false
xml:space=preserve xml:space=default xml:space=preserve xml :space=default
reserved reserved Preserve Preserve and truncate PreserveWhiteSpace = false when the document is loaded
preserveWhiteSpace=true preserveWhiteSpace=false preserveWhiteSpace=false
xml:space=preserve xml:space=default xml:space=preserve xml :space=default
The reserved here means that the original text content is exactly the same as in the original XML document, and truncation means that leading and trailing spaces have been removed, Semi-preserved means that "significant whitespace characters" are preserved and "unimportant whitespace characters" are normalized. Important whitespace characters are whitespace characters within the text content. Unimportant whitespace characters are the whitespace characters between tokens, look like this:
n
t Janen
In this example, red is unimportant whitespace characters that can be ignored, while green is The whitespace character is important because it is part of the text content and therefore has important meaning that cannot be ignored. So in this example, the text property returns the following:
Status return value
Keep and truncate "JanentSmith"
Please note that "semi-preserved" will normalize unimportant whitespace characters, for example, newline and tab characters will be reduced to a single space. If you change the xml:space attribute and preserveWhiteSpace switch, the text properties will return correspondingly different values.
CDATA and xml:space="preserve" subtree boundaries
In the example below, the contents of CDATA nodes or "preserve" nodes will be concatenated because they do not participate in normalization of unimportant whitespace characters. For example:
n
t Jane n t Smith ]>n
In this case, whitespace characters inside the CDATA node are no longer "merged" with "unimportant" whitespace characters and are not truncated. So the "semi-preserved and truncated" case will return the following:
"Jane Smith
Smith
Jane n
tn
]>
&Jane;
Assuming preserveWhiteSpace=false (in DOCTYPE tag scope), unimportant whitespace characters are lost when parsing entities. Entities will not There are blank character nodes. The tree will look like:
DOCTYPE foo
ELEMENT: name
ELEMENT: title
ELEMENT: foo
ATTRI BUTE:xml :space="preserve"
ENTITYREF: Jane
Please note that the DOM tree exposed under the ENTITY node inside the DOCTYPE does not contain any WHITESPACE nodes. This means that the children of the ENTITYREF node do not have WHITESPACE nodes, even if the entity reference is in the xml The same goes for :space="preserve". Each instance of an ENTITY referenced in a given document usually has the same tree. If an entity must absolutely preserve whitespace characters, it must specify itself within itself. The xml:space attribute, or the document preserveWhiteSpace switch must be set to true.
How to deal with whitespace characters in attributes?
There are several ways to access the attribute value. The IXMLDOMAttribute interface has the nodeValue attribute, which is equivalent to the Microsoft extension. nodeValue and text properties. These properties return: The text returned by the property
attrNode.nodeValue
attrNode.value
getAttribute("name") returns exactly the same content (and extended entities) as in the original document.
attrNode.nodeTypedValue Null
attrNode.text is the same as nodeValue except that leading and trailing whitespace characters have been truncated.
The "XML Language" specification defines the following behavior for XML applications: Text returned by attribute type
CDATA ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, enumeration
Semi-normalized Full normalized
Semi-normalized here It means converting new lines and tab characters into spaces, but multiple spaces will not degenerate into one space.
The above is the content of the FAQ (3) for getting started with XML. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Can XML files be opened with PPT? XML, Extensible Markup Language (Extensible Markup Language), is a universal markup language that is widely used in data exchange and data storage. Compared with HTML, XML is more flexible and can define its own tags and data structures, making the storage and exchange of data more convenient and unified. PPT, or PowerPoint, is a software developed by Microsoft for creating presentations. It provides a comprehensive way of

Using Python to merge and deduplicate XML data XML (eXtensibleMarkupLanguage) is a markup language used to store and transmit data. When processing XML data, sometimes we need to merge multiple XML files into one, or remove duplicate data. This article will introduce how to use Python to implement XML data merging and deduplication, and give corresponding code examples. 1. XML data merging When we have multiple XML files, we need to merge them

Implementing filtering and sorting of XML data using Python Introduction: XML is a commonly used data exchange format that stores data in the form of tags and attributes. When processing XML data, we often need to filter and sort the data. Python provides many useful tools and libraries to process XML data. This article will introduce how to use Python to filter and sort XML data. Reading the XML file Before we begin, we need to read the XML file. Python has many XML processing libraries,

Convert XML data in Python to CSV format XML (ExtensibleMarkupLanguage) is an extensible markup language commonly used for data storage and transmission. CSV (CommaSeparatedValues) is a comma-delimited text file format commonly used for data import and export. When processing data, sometimes it is necessary to convert XML data to CSV format for easy analysis and processing. Python is a powerful

Importing XML data into the database using PHP Introduction: During development, we often need to import external data into the database for further processing and analysis. As a commonly used data exchange format, XML is often used to store and transmit structured data. This article will introduce how to use PHP to import XML data into a database. Step 1: Parse the XML file First, we need to parse the XML file and extract the required data. PHP provides several ways to parse XML, the most commonly used of which is using Simple

Python implements conversion between XML and JSON Introduction: In the daily development process, we often need to convert data between different formats. XML and JSON are common data exchange formats. In Python, we can use various libraries to convert between XML and JSON. This article will introduce several commonly used methods, with code examples. 1. To convert XML to JSON in Python, we can use the xml.etree.ElementTree module

Handling Errors and Exceptions in XML Using Python XML is a commonly used data format used to store and represent structured data. When we use Python to process XML, sometimes we may encounter some errors and exceptions. In this article, I will introduce how to use Python to handle errors and exceptions in XML, and provide some sample code for reference. Use try-except statement to catch XML parsing errors When we use Python to parse XML, sometimes we may encounter some

Python parses special characters and escape sequences in XML XML (eXtensibleMarkupLanguage) is a commonly used data exchange format used to transfer and store data between different systems. When processing XML files, you often encounter situations that contain special characters and escape sequences, which may cause parsing errors or misinterpretation of the data. Therefore, when parsing XML files using Python, we need to understand how to handle these special characters and escape sequences. 1. Special characters and
