Reading DOCX Files in PHP
When attempting to read DOCX files in PHP, users may encounter difficulties with garbled characters appearing in the output. This issue arises primarily because DOCX files are compressed packages that require specialized handling. The following code demonstrates how to read and process DOCX files in PHP:
<code class="php">function read_file_docx($filename){ $striped_content = ''; $content = ''; if(!$filename || !file_exists($filename)) return false; $zip = zip_open($filename); if (!$zip || is_numeric($zip)) return false; while ($zip_entry = zip_read($zip)) { if (zip_entry_open($zip, $zip_entry) == FALSE) continue; if (zip_entry_name($zip_entry) != "word/document.xml") continue; $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)); zip_entry_close($zip_entry); }// end while zip_close($zip); //echo $content; //echo "<hr>"; //file_put_contents('1.xml', $content); $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content); $content = str_replace('</w:r></w:p>', "\r\n", $content); $striped_content = strip_tags($content); return $striped_content; } $filename = "filepath";// or /var/www/html/file.docx $content = read_file_docx($filename); if($content !== false) { echo nl2br($content); } else { echo 'Couldn\'t the file. Please check that file.'; }</code>
This code uses the PHP ZIP extension to open the DOCX file as a zip package. It then locates the "word/document.xml" file within the zip package, which contains the document's text content. The content is then extracted and cleaned by replacing tags and stripping HTML tags. The resulting text can then be displayed or processed as needed.
The above is the detailed content of How to Read and Extract Text from DOCX Files in PHP?. For more information, please follow other related articles on the PHP Chinese website!