在轉換非標記URL 時保留HTML 標記中的URL
在HTML 文件中,可能需要將純文字URL 轉換為可點擊的URL鏈接,同時排除已包含在HTML 標記中的URL。這可能會帶來挑戰,因為許多常見的文字替換方法也會無意中定位標記的 URL。
問題陳述
以下HTML 文字片段說明了遇到的問題:
<code class="html"><p>I need you help here.</p> <p>I want to turn this:</p> <pre class="brush:php;toolbar:false">sometext sometext http://www.somedomain.com/index.html sometext sometext
into:
sometext sometext <a href="http://somedoamai.com/index.html">www.somedomain.com/index.html</a> sometext sometext
However, the existing regex solution also targets URLs within img tags:
sometext sometext <img src="http//domain.com/image.jpg"> sometext sometext
Converting this accidentally produces:
sometext sometext <img src="<a href="http//domain.com/image.jpg">domain.com/image.jpg</a>"> sometext sometext**Solution** To effectively isolate and replace URLs that are not within HTML tags, we can leverage XPath and DOM manipulation. Using an XPath query, we can select text nodes containing URLs while excluding those that are descendants of anchor tags:
$texts = $xPath->query(
'/html/body//text()[ not(ancestor::a) and ( contains(.,"http://") or contains(.,"https://") or contains(.,"ftp://") )]'
);
Once these text nodes are identified, we can replace them with document fragments containing the appropriate anchor elements. This ensures that the URLs are converted without affecting the surrounding HTML structure:
foreach ($texts as $text) {
$fragment = $dom->createDocumentFragment(); $fragment->appendXML( preg_replace( "~((?:http|https|ftp)://(?:\S*?\.\S*?))(?=\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)~i", '<a href=""></a>', $text->data ) ); $text->parentNode->replaceChild($fragment, $text);
}
以上是如何將純文字 URL 轉換為 HTML 中的可點擊鏈接,同時保留標籤內的 URL?的詳細內容。更多資訊請關注PHP中文網其他相關文章!