Ensuring UTF-8 File Encoding for File Conversion
In the midst of a site conversion to UTF-8 encoding, a common task is to ensure that all files are encoded accordingly. However, existing files may use an outdated encoding format. This article explores a solution to write files in UTF-8 format, addressing the issue presented in a recent query.
A simple script was employed to convert the encoding, but the resulting files retained their original encoding. The following script was used:
header('Content-type: text/html; charset=utf-8'); mb_internal_encoding('UTF-8'); $fpath = "folder"; $d = dir($fpath); while (False !== ($a = $d->read())) { if ($a != '.' and $a != '..') { $npath = $fpath . '/' . $a; $data = file_get_contents($npath); file_put_contents('tempfolder/' . $a, $data); } }
To successfully write files in UTF-8 encoding, the script requires the addition of UTF-8 BOM (Byte Order Mark):
file_put_contents($myFile, "\xEF\xBB\xBF". $content);
This prefix ensures that the file is recognized as UTF-8 encoded and correctly interpreted. With this modification, the script will effectively convert and save the files in UTF-8 format, resolving the encoding discrepancy.
The above is the detailed content of How to Ensure UTF-8 Encoding for File Conversion: Why Adding a BOM Makes the Difference?. For more information, please follow other related articles on the PHP Chinese website!