替換字串中的文字並忽略 HTML 標記中的匹配項
P粉676821490
P粉676821490 2024-03-27 19:23:55
0
1
295

對於給定的字串(通常是一個段落),我想替換一些單字/短語,但如果它們碰巧以某種方式被標籤包圍,則忽略它們。這也需要不區分大小寫。

以此為例:

You can find a link here <a href="#">link</a> and a lot 
of things in different styles. Public platform can appear in bold: 
<b>public platform</b>, and we also have italics here too: <i>italics</i>. 
While I like soft pillows I am picky about soft <i>pillows</i>. 
While I want to find fox, I din't want foxes to show up.
The text "shiny fruits" is in a span tag:  one of the <span>shiny fruits</span>.

假設我想替換這些​​字:

  • link:出現 2 次。第一個是純文字(匹配),第二個是 A 標記(忽略)
  • 公共平台:純文字(匹配,不區分大小寫),B 標記中的第二個(忽略)
  • softpillows:1 個純文字符合。
  • fox:1 個純文字符合。它查看完整的單字。
  • fruits:純文字(符合),span 標記中的第二個(忽略)與其他文字

作為背景;我正在搜尋短語匹配(不是單字)並將匹配連結到相關頁面。

我想避免巢狀HTML(粗體標籤內沒有連結,反之亦然)或其他錯誤(例如:the <a href="# ">phrase <b>goes</ a> 這裡</b>)

我嘗試了幾種方法,例如搜尋已刪除 HTML 內容的經過清理的文字副本,雖然這告訴我存在匹配項,但我遇到了將其映射回原始內容的全新問題。

P粉676821490
P粉676821490

全部回覆(1)
P粉594941301

我發現了關於正規表示式否定前瞻的提及,並且在打破我的想法之後得到這個正規表示式(假設你有VALID html標籤配對)

// made function a bit ugly just to try to show how it comes together
public function replaceTextOutsideTags($sourceText = null, $toReplace = 'inner text', $dummyText = '(REPLACED TEXT HERE)')
{
  $string = $sourceText ?? "Inner text
  You can find a link here link and a lot 
  of things in different styles. Public platform can appear in bold: 
  public platform, and we also have italics here too: italics. 
  While I like soft pillows I am picky about soft pillows. 
  While I want to find fox, I din't want foxes to show up.
  The text \"shiny fruits\" is in a span tag:  one of the shiny fruits.
  The inner text like this inner inner text  here to test too, event inner text
  omg thats sad... or not
  ";
  // it would be nice to use [[:punct:]] but somehow regex thinks that  are also punctuation marks
  $punctuation = "\.,!\?:;\|\/=\"#"; // this part might take additional attention but you get the point
  $stringPart = "\b$toReplace\b";
  $excludeSequence = "(?![\w\n\s>$punctuation]*?";
  $excludeOutside = "$excludeSequence)"; // note on closing )
  $pattern = "/" . $stringPart . $excludeOutside . $excludeTag . "/im";
  
  return preg_replace($pattern, $dummyText, $string);
}

帶有預設參數的範例輸出

"""
     (REPLACED TEXT HERE)\r\n
     You can find a link here link and a lot \r\n
     of things in different styles. Public platform can appear in bold: \r\n
     public platform, and we also have italics here too: italics. \r\n
     While I like soft pillows I am picky about soft pillows. \r\n
     While I want to find fox, I din't want foxes to show up.\r\n
     The text "shiny fruits" is in a span tag:  one of the shiny fruits.\r\n
     The (REPLACED TEXT HERE) like this inner inner text  here to test too, event (REPLACED TEXT HERE)\r\n
     omg thats sad... or not     
     """

現在一步一步

  1. 沒有後續符合(如果只有 pillowS,我們就不需要 pillow#)
  2. 如果文字後面跟著任意長度的\w 單字符號、\s 空格或\n 換行符號和允許以開始結束標記 結尾的標點符號 - 我們不需要這個匹配,這裡出現了否定的先行(?![\w\n\s>$標點符號]*?。在這裡我們可以確定匹配不會進入新標籤,因為 不在描述的序列中($excludeOutside 變數)
  3. $excludeTag 變數與$excludeOutside 基本上相同,但適用於$toReplace 可以是html 標籤本身的情況,例如一個
請注意,此程式碼無法使用 > 覆寫文本,並且使用這些符號可能會導致意外行為
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板