In an attempt to find links on a page, a common approach is to use regular expressions. However, in cases like this:
<a title="this" href="that">what?</a>
where the href attribute is not placed first within the a tag, the following regex may fail:
/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/
Finding a reliable regular expression for handling HTML can be challenging. As an alternative, consider using the DOM (Document Object Model) for this purpose.
Here's how you can use DOM to retrieve the href attribute and other information from A elements:
$dom = new DOMDocument; $dom->loadHTML($html); // Loop through all 'a' elements foreach ($dom->getElementsByTagName('a') as $node) { // Output the entire 'a' element's outer HTML echo $dom->saveHtml($node), PHP_EOL; // Get the node's text value echo $node->nodeValue; // Check if the node has a 'href' attribute echo $node->hasAttribute( 'href' ); // Get the 'href' attribute's value echo $node->getAttribute( 'href' ); // Change the 'href' attribute's value $node->setAttribute('href', 'something else'); // Remove the 'href' attribute $node->removeAttribute('href'); }
XPath can also be used to query for specific attributes, such as the href attribute:
$dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//a/@href'); foreach($nodes as $href) { echo $href->nodeValue; // echo current attribute value $href->nodeValue = 'new value'; // set new attribute value $href->parentNode->removeAttribute('href'); // remove attribute }
Using the DOM, you can easily retrieve and manipulate attributes like href from A elements. This approach provides a more reliable and flexible way to handle HTML than regular expressions.
The above is the detailed content of How Can I Reliably Retrieve the `href` Attribute from an `` Element in HTML?. For more information, please follow other related articles on the PHP Chinese website!