How to Read and Extract Text from DOCX Files in PHP?

Mary-Kate Olsen
Release: 2024-10-28 17:43:29
Original
208 people have browsed it

How to Read and Extract Text from DOCX Files in PHP?

Reading DOCX Files in PHP

When attempting to read DOCX files in PHP, users may encounter difficulties with garbled characters appearing in the output. This issue arises primarily because DOCX files are compressed packages that require specialized handling. The following code demonstrates how to read and process DOCX files in PHP:

<code class="php">function read_file_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    //echo $content;
    //echo "<hr>";
    //file_put_contents('1.xml', $content);

    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}
$filename = "filepath";// or /var/www/html/file.docx

$content = read_file_docx($filename);
if($content !== false) {

    echo nl2br($content);
}
else {
    echo 'Couldn\'t the file. Please check that file.';
}</code>
Copy after login

This code uses the PHP ZIP extension to open the DOCX file as a zip package. It then locates the "word/document.xml" file within the zip package, which contains the document's text content. The content is then extracted and cleaned by replacing tags and stripping HTML tags. The resulting text can then be displayed or processed as needed.

The above is the detailed content of How to Read and Extract Text from DOCX Files in PHP?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!