Keep specific html tags when splitting string
P粉841870942
P粉841870942 2024-03-31 18:12:42
0
1
305

I need to split a string by a specific number of tags (<li>, <lu> ...). I figured out the regular expression

pattern = <li>|<ul>|<ol>|<li>|<dl>|<dt>|<dd>|<h1>|< h2>| <h3>|<h4>|<h5>|<h6> and re.split

Basically it gets the job done

test_string = '<p> Some text some text some text. </p> <p> Another text another text </p>. <li> some list </li>. <ul> another list </ul>'
res = re.search(test_string, pattern) 
-> `['<p> Some text some text some text. </p> <p> Another text another text </p>. ', ' some list </li>. ', ' another list </ul>']`

But I want to capture the opening and closing tags and keep the tags in the split text. Something similar

['<p> Some text some text some text. </p> <p> Another text another text </p>. ', '<li> some list </li>. ', '<ul>another list </ul>']`

P粉841870942
P粉841870942

reply all(1)
P粉787806024

To answer your specific questions:

[^

And match instead of split.

\1 refers to what is captured in the opening tag.

Similar to:

for match in re.finditer(r"[^", subject, re.DOTALL):

However, in most real cases this is not sufficient to handle HTML and you should consider a DOM parser.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!