©
This document uses PHP Chinese website manual Release
(PHP 5, PHP 7)
Represents an entire HTML or XML document; serves as the root of the document tree.
$version
[, string $encoding
]] )$name
)$namespaceURI
, string $qualifiedName
)$data
)$data
)$name
[, string $value
] )$namespaceURI
, string $qualifiedName
[, string $value
] )$name
)$target
[, string $data
] )$content
)$elementId
)$name
)$namespaceURI
, string $localName
)$importedNode
[, bool $deep
] )$filename
[, int $options
= 0
] )$source
[, int $options
= 0
] )$filename
[, int $options
= 0
] )$source
[, int $options
= 0
] )$baseclass
, string $extendedclass
)$filename
)$source
)$filename
[, int $options
] )$node
= NULL
] )$filename
)$node
[, int $options
]] )$filename
[, int $flags
] )$source
[, int $flags
] )$options
] )$newnode
)$exclusive
[, bool $with_comments
[, array $xpath
[, array $ns_prefixes
]]]] )$uri
[, bool $exclusive
[, bool $with_comments
[, array $xpath
[, array $ns_prefixes
]]]] )$deep
] )$newnode
[, DOMNode $refnode
] )$namespaceURI
)$node
)$feature
, string $version
)$prefix
)$namespaceURI
)$oldnode
)$newnode
, DOMNode $oldnode
)Deprecated. Actual encoding of the document, is a readonly equivalent to encoding .
Deprecated. Configuration used when DOMDocument::normalizeDocument() is invoked.
The Document Type Declaration associated with this document.
This is a convenience attribute that allows direct access to the child node that is the document element of the document.
The location of the document or NULL
if undefined.
Encoding of the document, as specified by the XML declaration. This attribute is not present in the final DOM Level 3 specification, but is the only way of manipulating XML document encoding in this implementation.
Nicely formats output with indentation and extra space.
The DOMImplementation object that handles this document.
Do not remove redundant white space. Default to TRUE
.
Proprietary. Enables recovery mode, i.e. trying to parse non-well formed documents. This attribute is not part of the DOM specification and is specific to libxml.
Set it to TRUE
to load external entities from a doctype
declaration. This is useful for including character entities in
your XML document.
Deprecated. Whether or not the document is standalone, as specified by the XML declaration, corresponds to xmlStandalone .
Throws DOMException on errors. Default to TRUE
.
Proprietary. Whether or not to substitute entities. This attribute is not part of the DOM specification and is specific to libxml.
Loads and validates against the DTD. Default to FALSE
.
Deprecated. Version of XML, corresponds to xmlVersion .
An attribute specifying, as part of the XML declaration, the
encoding of this document. This is NULL
when unspecified or when it
is not known, such as when the Document was created in memory.
An attribute specifying, as part of the XML declaration, whether
this document is standalone. This is FALSE
when unspecified.
An attribute specifying, as part of the XML declaration, the version number of this document. If there is no declaration and if this document supports the "XML" feature, the value is "1.0".
Note:
The DOM extension uses UTF-8 encoding. Use utf8_encode() and utf8_decode() to work with texts in ISO-8859-1 encoding or Iconv for other encodings.
[#1] ingjetel at gmail dot com [2015-05-13 23:54:39]
Easy function for basic output of XML file via DOM parsing
<?php
$dom = new DomDocument();
$dom->load("./file.xml") or die("error");
$start = $dom->documentElement;
fc($start);
function fc($node) {
$child = $node->childNodes;
foreach($child as $item) {
if ($item->nodeType == XML_TEXT_NODE) {
if (strlen(trim($item->nodeValue))) echo trim($item->nodeValue)."<br/>";
}
else if ($item->nodeType == XML_ELEMENT_NODE) fc($item);
}
}
?>
[#2] qrworld.net [2014-11-11 16:35:33]
In this post http://softontherocks.blogspot.com/2014/11/descargar-el-contenido-de-una-url_11.html I found a simple way to get the content of a URL with DOMDocument, loadHTMLFile and saveHTML().
function getURLContent($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile($url);
return $doc->saveHTML();
}
[#3] danny dot nunez15 at gmail dot com [2013-10-28 14:54:33]
A simple function to grab all links in a page.
function get_links($url) {
// Create a new DOM Document to hold our webpage structure
$xml = new DOMDocument();
// Load the url's contents into the DOM
$xml->loadHTMLFile($url);
// Empty array to hold all links to return
$links = array();
//Loop through each <a> tag in the dom and add it to the link array
foreach ($xml->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
if (!empty($url)) {
$links[] = $link->getAttribute('href');
}
}
//Return the links
return $links;
}
[#4] sites.sitesbr.net [2013-01-22 00:37:05]
How to objetify a DomDocument with hierarchy like:
<root>
<item>
<prop1>info1</prop1>
<prop2>info2</prop2>
<prop3>info3</prop3>
</item>
<item>
<prop1>info1</prop1>
<prop2>info2</prop2>
<prop3>info3</prop3>
</item>
</root>
It's possible to use in object style to retrieve information, as:
<?php
$theNodeValue = $aitem->prop1;
?>
Here is the code: one Class and 2 functions.
<?php
class ArrayNode{
public $nodeName, $nodeValue;
}
function getChildNodeElements( $domNode ){
$nodes = array();
for( $i=0; $i < $domNode->childNodes->length; $i++){
$cn = $domNode->childNodes->item($i);
if( $cn->nodeType == 1){
$nodes[] = $cn;
}
}
return $nodes;
}
function getArrayNodes( $domDoc ){
$res = array();
for( $i=0; $i < $domDoc->childNodes->length; $i++){
$cn = $domDoc->childNodes->item($i);
# The first is the root tag...
if( $cn->nodeType == 1){
# But we want it's childNodes.
$sub_cn = getChildNodeElements( $cn);
# Found the tagName:
$baseItemTagName = $sub_cn[0]->nodeName;
break;
}
}
$dnl = $domDoc->getElementsByTagName( $baseItemTagName);
for( $i=0; $i< $dnl->length; $i++){
$arrayNode = new ArrayNode();
# Summary
$arrayNode->nodeName = $dnl->item($i)->nodeName;
$arrayNode->nodeValue = $dnl->item($i)->nodeValue;
# Child Nodes
$cn = $dnl->item($i)->childNodes;
for( $k=0; $k<$cn->length; $k++){
if( $cn->item($k)->nodeName == "#text" && trim($cn->item($k)->nodeValue) == "") continue;
$arrayNode->{$cn->item($k)->nodeName} = $cn->item($k)->nodeValue;
}
# Attributes
$attr = $dnl->item($i)->attributes;
for( $k=0; $k < $attr->length; $k++){
if(! is_null($attr)){
if( $attr->item($k)->nodeName == "#text" && trim($attr->item($k)->nodeValue) == "") continue;
$arrayNode->{$attr->item($k)->nodeName} = $attr->item($k)->nodeValue;
}
}
$res[] = $arrayNode;
}
return $res;
}
?>
To use it:
<?php
# First you load a XML in a DomDocument variable.
$url = "/path/to/yourxmlfile.xml";
$domSrc = file_get_contents($url);
$dom = new DomDocument();
$dom->loadXML( $domSrc );
# Then, you get the ArrayNodes from the DomDocument.
$ans = getArrayNodes( $dom );
for( $i=0; $i < count( $ans ) ; $i++){
$cn = $ans[ $i];
$info1 = $cn->prop1;
$info2 = $cn->prop2;
$info3 = $cn->prop3;
// ...
}
?>
[#5] Nick M [2011-06-01 10:16:09]
You may need to save all or part of a DOMDocument as an XHTML-friendly string, something compliant with both XML and HTML 4. Here's the DOMDocument class extended with a saveXHTML method:
<?php
class XHTMLDocument extends DOMDocument {
public $selfTerminate = array(
'area','base','basefont','br','col','frame','hr','img','input','link','meta','param'
);
public function saveXHTML(DOMNode $node=null) {
if (!$node) $node = $this->firstChild;
$doc = new DOMDocument('1.0');
$clone = $doc->importNode($node->cloneNode(false), true);
$term = in_array(strtolower($clone->nodeName), $this->selfTerminate);
$inner='';
if (!$term) {
$clone->appendChild(new DOMText(''));
if ($node->childNodes) foreach ($node->childNodes as $child) {
$inner .= $this->saveXHTML($child);
}
}
$doc->appendChild($clone);
$out = $doc->saveXML($clone);
return $term ? substr($out, 0, -2) . ' />' : str_replace('><', ">$inner<", $out);
}
}
?>
This hasn't been benchmarked, but is probably significantly slower than saveXML or saveHTML and should be used sparingly.
[#6] evert at er dot nl [2010-11-20 02:17:48]
A nice and simple node 2 array I wrote, worth a try ;)
<?php
function getArray($node)
{
$array = false;
if ($node->hasAttributes())
{
foreach ($node->attributes as $attr)
{
$array[$attr->nodeName] = $attr->nodeValue;
}
}
if ($node->hasChildNodes())
{
if ($node->childNodes->length == 1)
{
$array[$node->firstChild->nodeName] = $node->firstChild->nodeValue;
}
else
{
foreach ($node->childNodes as $childNode)
{
if ($childNode->nodeType != XML_TEXT_NODE)
{
$array[$childNode->nodeName][] = $this->getArray($childNode);
}
}
}
}
return $array;
}
?>
[#7] admin at beerpla dot net [2010-03-12 02:12:02]
After seeing many complaints about certain DOMDocument shortcomings, such as bad handling of encodings and always saving HTML fragments with <html>, <head>, and DOCTYPE, I decided that a better solution is needed.
So here it is: SmartDOMDocument. You can find it at http://beerpla.net/projects/smartdomdocument/
Currently, the main highlights are:
- SmartDOMDocument inherits from DOMDocument, so it's very easy to use - just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).
- saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want - it saves HTML without adding that extra garbage that DOMDocument does.
- encoding fix - DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you - just use loadHTML() as you would normally.
- SmartDOMDocument Object As String - you can use a SmartDOMDocument object as a string which will print out its contents.
For example:
<?php
echo "Here is the HTML: $smart_dom_doc";
?>
I'm going to maintain this code and try to fix bugs as they come in.
Enjoy.
[#8] tloach at gmail dot com [2010-02-05 10:01:16]
For anyone else who has been having issues with formatOuput not working, here is a work-around:
rather than just doing something like:
<?php
$outXML = $xml->saveXML();
?>
force it to reload the XML from scratch, then it will format correctly:
<?php
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
?>
[#9] jay at jaygilford dot com [2010-01-27 08:46:16]
Here's a small function I wrote to get all page links using the DOMDocument which will hopefully be of use to others
<?php
function get_links($url) {
// Create a new DOM Document to hold our webpage structure
$xml = new DOMDocument();
// Load the url's contents into the DOM
$xml->loadHTMLFile($url);
// Empty array to hold all links to return
$links = array();
//Loop through each <a> tag in the dom and add it to the link array
foreach($xml->getElementsByTagName('a') as $link) {
$links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
}
//Return the links
return $links;
}
?>
[#10] fcartegnie [2009-10-31 13:30:18]
Be careful with formatOutput().
Creating an empty node like this:
createElement('foo','')
instead of
createElement('foo')
will break formatOutput.
[#11] PhilipWayneRollins at gmail dot com [2009-08-15 13:32:11]
If you want to use the DOMDocument to create xHTML documents here is a simple class
Note this is designed for creating xHTML documents from scratch but could be easily extended to work with xHTML documents. Also this is for xHTML not XML.
<?php
class Document
{
public $doctype;
public $head;
public $title = 'Sensei Ninja';
public $body;
private $styles;
private $metas;
private $scripts;
private $document;
function __construct ( )
{
$this->document = new DOMDocument( );
$this->head = $this->document->createElement( 'head', ' ' );
$this->body = $this->document->createElement( 'body', ' ' );
}
public function addStyleSheet ( $url, $media='all' )
{
$element = $this->document->createElement( 'link' );
$element->setAttribute( 'type', 'text/css' );
$element->setAttribute( 'href', $url );
$element->setAttribute( 'media', $media );
$this->styles[] = $element;
}
public function addScript ( $url )
{
$element = $this->document->createElement( 'script', ' ' );
$element->setAttribute( 'type', 'text/javascript' );
$element->setAttribute( 'src', $url );
$this->scripts[] = $element;
}
public function addMetaTag ( $name, $content )
{
$element = $this->document->createElement( 'meta' );
$element->setAttribute( 'name', $name );
$element->setAttribute( 'content', $content );
$this->metas[] = $element;
}
public function setDescription ( $dec )
{
$this->addMetaTag( 'description', $dec );
}
public function setKeywords ( $keywords )
{
$this->addMetaTag( 'keywords', $keywords );
}
public function createElement ( $nodeName, $nodeValue=null )
{
return $this->document->createElement( $nodeName, $nodeValue );
}
public function assemble ( )
{
// Doctype creation
$doctype = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML TRANSITIONAL 1.0//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
// Create the head element
$title = $this->document->createElement( 'title', $this->title );
// Add stylesheets if needed
if ( is_array( $this->styles ))
foreach ( $this->styles as $element )
$this->head->appendChild( $element );
// Add scripts if needed
if( is_array( $this->scripts ))
foreach ( $this->scripts as $element )
$this->head->appendChild( $element );
// Add meta tags if needed
if ( is_array( $this->metas ))
foreach ( $this->metas as $element )
$this->head->appendChild( $element );
$this->head->appendChild( $title );
// Create the document
$html = $this->document->createElement( 'html' );
$html->setAttribute( 'xmlns', 'http://www.w3.org/1999/xhtml' );
$html->setAttribute( 'xml:lang', 'en' );
$html->setAttribute( 'lang', 'en' );
$html->appendChild( $this->head );
$html->appendChild( $this->body );
$this->document->appendChild( $html );
return $doctype . $this->document->saveXML( );
}
}
?>
Small example
<?php
$document = new Document( );
$document->title = 'Hello';
$document->addStyleSheet( 'StyleSheets/main.css' );
$div = $document->createElement( 'div' );
$div->nodeValue = 'Hello, world!';
$div->setAttribute( 'style', 'color: red;' );
$document->body->appendChild( $div );
printf( '%s', $document->assemble( ) );
?>
[#12] cmyk777 at gmail dot com [2009-05-23 12:31:31]
This function may help to debug current dom element:
<?php
function dom_dump($obj) {
if ($classname = get_class($obj)) {
$retval = "Instance of $classname, node list: \n";
switch (true) {
case ($obj instanceof DOMDocument):
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->saveXML($obj);
break;
case ($obj instanceof DOMElement):
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
break;
case ($obj instanceof DOMAttr):
$retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
//$retval .= $obj->ownerDocument->saveXML($obj);
break;
case ($obj instanceof DOMNodeList):
for ($i = 0; $i < $obj->length; $i++) {
$retval .= "Item #$i, XPath: {$obj->item($i)->getNodePath()}\n".
"{$obj->item($i)->ownerDocument->saveXML($obj->item($i))}\n";
}
break;
default:
return "Instance of unknown class";
}
} else {
return 'no elements...';
}
return htmlspecialchars($retval);
}
?>
Example usage:
<?php
$dom = new DomDocument();
$dom->load('test.xml');
$body = $dom->documentElement->getElementsByTagName('book');
echo '<pre>'.dom_dump($body).'<pre>';
?>
Output:
Instance of DOMNodeList, node list:
Item #0, XPath: /library/book[1]
<book isbn="0345342968">
<title>Fahrenheit 451</title>
<author>R. Bradbury</author>
<publisher>Del Rey</publisher>
</book>
Item #1, XPath: /library/book[2]
<book isbn="0048231398">
<title>The Silmarillion</title>
<author>J.R.R. Tolkien</author>
<publisher>G. Allen & Unwin</publisher>
</book>
Item #2, XPath: /library/book[3]
<book isbn="0451524934">
<title>1984</title>
<author>G. Orwell</author>
<publisher>Signet</publisher>
</book>
Item #3, XPath: /library/book[4]
<book isbn="031219126X">
<title>Frankenstein</title>
<author>M. Shelley</author>
<publisher>Bedford</publisher>
</book>
Item #4, XPath: /library/book[5]
<book isbn="0312863551">
<title>The Moon Is a Harsh Mistress</title>
<author>R. A. Heinlein</author>
<publisher>Orb</publisher>
</book>
[#13] Fernando H [2008-04-11 00:48:01]
Showing a quick example of how to use this class, just so that new users can get a quick start without having to figure it all out by themself. ( At the day of posting, this documentation just got added and is lacking examples. )
<?php
// Set the content type to be XML, so that the browser will recognise it as XML.
header( "content-type: application/xml; charset=ISO-8859-15" );
// "Create" the document.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
// Create some elements.
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track", "The ninth symphony" );
// Set the attributes.
$xml_track->setAttribute( "length", "0:01:15" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
// Create another element, just to show you can add any (realistic to computer) number of sublevels.
$xml_note = $xml->createElement( "Note", "The last symphony composed by Ludwig van Beethoven." );
// Append the whole bunch.
$xml_track->appendChild( $xml_note );
$xml_album->appendChild( $xml_track );
// Repeat the above with some different values..
$xml_track = $xml->createElement( "Track", "Highway Blues" );
$xml_track->setAttribute( "length", "0:01:33" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
// Parse the XML.
print $xml->saveXML();
?>
Output:
<Album>
<Track length="0:01:15" bitrate="64kb/s" channels="2">
The ninth symphony
<Note>
The last symphony composed by Ludwig van Beethoven.
</Note>
</Track>
<Track length="0:01:33" bitrate="64kb/s" channels="2">Highway Blues</Track>
</Album>
If you want your PHP->DOM code to run under the .xml extension, you should set your webserver up to run the .xml extension with PHP ( Refer to the installation/configuration configuration for PHP on how to do this ).
Note that this:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>
is NOT the same as this:
<?php
// Will NOT work.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml_track = new DOMElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>
although this will work:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml->appendChild( $xml_album );
?>