Home Backend Development PHP Tutorial How can I extract and categorize text data from an HTML document based on specific element classes using PHP?

How can I extract and categorize text data from an HTML document based on specific element classes using PHP?

Nov 12, 2024 pm 03:48 PM

How can I extract and categorize text data from an HTML document based on specific element classes using PHP?

Retrieve Text from Elements with Specified Class as a Comprehensive Array

In this query, the task at hand is to extract and categorize text data from an HTML document based on specific element classes. The HTML document contains various paragraphs with classes like "Heading1-P" and "Normal-P," each containing corresponding headings and content.

To accomplish this, we can utilize PHP DOM Document and XPath. The process involves parsing the HTML document and traversing its elements using XPath. We define a custom function, parseToArray() that takes an XPath object and class name as inputs. This function iterates through the elements matching the class and extracts their text content into an array.

Here's the detailed solution:

$test = <<< HTML
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($test);
$xpath = new DOMXPath($dom);
$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');

var_dump($heading);
echo "<br/>";
var_dump($content);
echo "<br/>";

function parseToArray(DOMXPath $xpath, string $class): array
{
    $xpathquery = "//[@class='$class']";
    $elements = $xpath->query($xpathquery);

    $resultarray = [];
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            $resultarray[] = $node->nodeValue;
        }
    }

    return $resultarray;
}
Copy after login

The function parseToArray() identifies elements based on a specific class name and extracts their text content into an array. Subsequently, two arrays are created: $heading and $content, which contain the chapter titles and corresponding paragraph text, respectively. The output of the code will be as follows:

array(3) {
  [0] =>
  string(8) "Chapter 1"
  [1] =>
  string(8) "Chapter 2"
  [2] =>
  string(8) "Chapter 3"
}
array(3) {
  [0] =>
  string(16) "This is chapter 1"
  [1] =>
  string(16) "This is chapter 2"
  [2] =>
  string(16) "This is chapter 3"
}
Copy after login

By employing this approach, you can efficiently retrieve and separate text content based on specific class names from an HTML document, allowing for flexible and targeted data processing.

The above is the detailed content of How can I extract and categorize text data from an HTML document based on specific element classes using PHP?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

11 Best PHP URL Shortener Scripts (Free and Premium) 11 Best PHP URL Shortener Scripts (Free and Premium) Mar 03, 2025 am 10:49 AM

11 Best PHP URL Shortener Scripts (Free and Premium)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Working with Flash Session Data in Laravel

Build a React App With a Laravel Back End: Part 2, React Build a React App With a Laravel Back End: Part 2, React Mar 04, 2025 am 09:33 AM

Build a React App With a Laravel Back End: Part 2, React

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Simplified HTTP Response Mocking in Laravel Tests

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

cURL in PHP: How to Use the PHP cURL Extension in REST APIs

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

12 Best PHP Chat Scripts on CodeCanyon

Announcement of 2025 PHP Situation Survey Announcement of 2025 PHP Situation Survey Mar 03, 2025 pm 04:20 PM

Announcement of 2025 PHP Situation Survey

Notifications in Laravel Notifications in Laravel Mar 04, 2025 am 09:22 AM

Notifications in Laravel

See all articles