替換字串中的文字並忽略 HTML 標記中的匹配項

Question

對於給定的字串（通常是一個段落），我想替換一些單字/短語，但如果它們碰巧以某種方式被標籤包圍，則忽略它們。這也需要不區分大小寫。以此為例：Youcanfindalinkherelinkandalotofthingsindifferentstyles.Publicplatformcanappearinbold:

P粉594941301 · Answer

我發現了關於正規表示式否定前瞻的提及，並且在打破我的想法之後得到這個正規表示式（假設你有VALID html標籤配對）

// made function a bit ugly just to try to show how it comes together
public function replaceTextOutsideTags($sourceText = null, $toReplace = 'inner text', $dummyText = '(REPLACED TEXT HERE)')
{
  $string = $sourceText ?? "Inner text
  You can find a link here link and a lot 
  of things in different styles. Public platform can appear in bold: 
  public platform, and we also have italics here too: italics. 
  While I like soft pillows I am picky about soft pillows. 
  While I want to find fox, I din't want foxes to show up.
  The text "shiny fruits" is in a span tag:  one of the shiny fruits.
  The inner text like this inner inner text  here to test too, event inner text
  omg thats sad... or not
  ";
  // it would be nice to use [[:punct:]] but somehow regex thinks that  are also punctuation marks
  $punctuation = "\.,!\?:;\|/="#"; // this part might take additional attention but you get the point
  $stringPart = "\b$toReplace\b";
  $excludeSequence = "(?![\w
\s>$punctuation]*?";
  $excludeOutside = "$excludeSequence)"; // note on closing )
  $pattern = "/" . $stringPart . $excludeOutside . $excludeTag . "/im";
  
  return preg_replace($pattern, $dummyText, $string);
}

帶有預設參數的範例輸出

"""
     (REPLACED TEXT HERE)

     You can find a link here link and a lot 

     of things in different styles. Public platform can appear in bold: 

     public platform, and we also have italics here too: italics. 

     While I like soft pillows I am picky about soft pillows. 

     While I want to find fox, I din't want foxes to show up.

     The text "shiny fruits" is in a span tag:  one of the shiny fruits.

     The (REPLACED TEXT HERE) like this inner inner text  here to test too, event (REPLACED TEXT HERE)

     omg thats sad... or not     
     """

現在一步一步

沒有後續符合（如果只有 pillowS，我們就不需要 pillow#）
如果文字後面跟著任意長度的\w 單字符號、\s 空格或換行符號和允許以開始結束標記結尾的標點符號 - 我們不需要這個匹配，這裡出現了否定的先行(?![\w \s>$標點符號]*?。在這裡我們可以確定匹配不會進入新標籤，因為不在描述的序列中（$excludeOutside 變數）
$excludeTag 變數與$excludeOutside 基本上相同，但適用於$toReplace 可以是html 標籤本身的情況，例如一個

請注意，此程式碼無法使用 或 > 覆寫文本，並且使用這些符號可能會導致意外行為

請注意，此程式碼無法使用 `或 > 覆寫文本，並且使用這些符號可能會導致意外行為`