Regular expression to remove spaces between invalid HTML tags - e.g. "</b>" should be "</b>"
P粉884667022
P粉884667022 2023-09-02 19:56:28
0
2
530
<p>I have some HTML that is messed up by spaces within tags and want to make it valid again - for example: </p> <pre class="brush:php;toolbar:false;">< div class='test' >1 > 0 is < b >true</ b> and apples >>> bananas< / div ></pre> <p> should be converted to valid HTML, and when rendered, is expected to produce: </p> <p> <pre class="snippet-code-html lang-html prettyprint-override"><code><div class='test'>1 > 0 is <b>true</b> and apples >>> bananas</div></code></pre> </p> <p>Any text preceded/followed by spaces in <code>></code> or </code>><</code> should remain unchanged - for example, <code> ;1 > 0</code> should be retained instead of being compressed to <code>1>0</code></p > <p>I realize this may require several regular expressions, which is fine</p> <p>I have a few things:</p> <p><code><\s?\/\s*</code> This will partially fix <code></ b></ div ></code> to< code></b></div ></code> but I'm working on the rest< /p> <p>For example, I could take a drastic approach, but that would also break the code within the label text portion, not the label name itself</p>
P粉884667022
P粉884667022

reply all(2)
P粉323050780

There is no reasonable way to save a document as corrupted as what you posted, but assuming you replace the > and similar characters in the text with their related entities, e.g.: &gt ;, you can put the document you want to accept into an appropriate library, such as DomDocument which will handle the rest.

$input = <<<_E_
< div class='test' >1 > 0 is < b >true</ b> and apples >>> bananas< / div >
_E_;

$input = preg_replace([ '#<\s+#', '#</\s+#' ], [ '<', '</' ], $input);

$d = new DomDocument();
$d->loadHTML($input, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

var_dump($d->saveHTML());

Output:

string(80) "<div class="test">1 > 0 is <b>true</b> and apples >>> bananas</div>"
P粉064448449

This regular expression is also valid:

It divides the valid part in the HTML tag into four parts and replaces the remaining parts (spaces) with them.

Regex101 Demo

/(]*\S)\s*(>)/g

  • ( - Capture the opening angle bracket (section 1)
  • \s* - matches any whitespace
  • (\/?) - Capturing optional backslashes (Part 2)
  • \s* - matches any space after a backslash
  • ([^]*\S) - captures content within tags without trailing spaces (section 3)
  • \s* - Matches spaces after the content and before the closing angle bracket
  • (>) - Capture right angle bracket (section 4)

const reg = /(<)\s*(\/?)\s*([^<>]*\S)\s*(>)/g
const str = "< div class='test' >1 > 0 is < b >true< / b > and apples >>> bananas< / div  >"
const newStr = str.replace(reg, "");
console.log(newStr);
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template