<p>In web development, it is often necessary to extract text content from HTML. At this time, we can use PHP's regular expressions to achieve this function. Regular expressions are a language for matching strings and can be used to parse HTML markup, filter text, validate forms, and more. </p>
<p> Below we will introduce how to use PHP regular expressions to extract all text content in HTML. </p>
<ol><li>Get HTML file contents</li></ol>
<p>First, we need to use PHP’s file reading function <code>file_get_contents()</code> to read the contents of the HTML file. For example, we have an HTML file named <code>example.html</code> that can be read with the following code: </p><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class='brush:php;toolbar:false;'>$html = file_get_contents("example.html");</pre><div class="contentsignin">Copy after login</div></div><ol start="2"><li>Writing a regular expression</li></ol><p> Next, we need to write a regular expression to match the text content in HTML. In HTML, text content is located between tags, and we can extract the text content by matching tags. </p><p>The following is a simple regular expression example that can match all HTML tags: </p><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class='brush:php;toolbar:false;'>$pattern = '/<[^>]*>/';</pre><div class="contentsignin">Copy after login</div></div><p>The meaning of this regular expression is: match starting with <code><</code>, A sequence of characters ending with <code>></code>, without any <code>></code> characters in between. </p><p>We can use the <code>preg_replace()</code> function to replace all HTML tags with empty strings to extract the text content in HTML: </p><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class='brush:php;toolbar:false;'>$text = preg_replace($pattern, '', $html);</pre><div class="contentsignin">Copy after login</div></div><ol start="3"><li>Filtering Special characters </li></ol><p>After extracting the text content in HTML, we also need to filter out some special characters, such as line breaks, tabs, etc. At this time, we can use PHP's <code>strip_tags()</code> function to remove all tags in HTML, and use the <code>trim()</code> function to remove whitespace characters at both ends of the string. </p><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class='brush:php;toolbar:false;'>$text = strip_tags($text);
$text = trim($text);</pre><div class="contentsignin">Copy after login</div></div><p>Finally, we can get all the text content in HTML. </p><p>The complete code is as follows: </p><div class="code" style="position:relative; padding:0px; margin:0px;"><pre class='brush:php;toolbar:false;'>$html = file_get_contents("example.html");
$pattern = '/<[^>]*>/';
$text = preg_replace($pattern, '', $html);
$text = strip_tags($text);
$text = trim($text);
echo $text;</pre><div class="contentsignin">Copy after login</div></div><p>Summary</p>
<p>Using PHP regular expressions to extract text content in HTML is a common operation. Through the introduction of the above steps, we can easily implement this function. However, it should be noted that regular expressions are only a basic matching tool. For complex HTML fragments, more complex matching methods may be needed to extract text content. </p>
The above is the detailed content of PHP Regular Expressions: How to extract all text content in HTML. For more information, please follow other related articles on the PHP Chinese website!