Home > Backend Development > PHP Tutorial > After php reads the csv file, the solution to the problem of uft8 bom causing the display on the page_PHP tutorial

After php reads the csv file, the solution to the problem of uft8 bom causing the display on the page_PHP tutorial

WBOY
Release: 2016-07-21 14:58:49
Original
792 people have browsed it

date.csv:
"ID" "NAME" "EMAIL"
"1" "Xiao Ming" "xm@163.com"
"2" "Xiaodong " "xd@sina.com"
"3" "小少" "shaozi@hotmai.com"

Read this csv file

Copy the code The code is as follows:

$handle=fopen('date.csv','r');
while($data=fgetcsv($handle,10000,"/t"))
{
echo "$data[0]"."$data[1]"."$data[2]";
}
?>

After being read and displayed on the page, it looks like this:
"ID" NAME EMAIL
1 Xiaomingxm@163.com
2 Xiaodongxd@sina. com
3 Xiaoshao shaozi@hotmai.com
The field surround character of the fgetcsv function is double quotes by default.
Why when I read it out, other fields are all fine, but the ID is still wrapped in double quotes. ?

After checking online, it turns out that the utf8 encoded BOM cannot be recognized under PHP.
The following is the information I found:
There is a concept of BOM in the Unicode specification. BOM - Byte Order Mark, which is the byte order mark. Find a description about BOM here

:
There is a character called "ZERO WIDTH NO-BREAK SPACE" in UCS encoding, and its encoding is FEFF. FFFE is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but can use the BOM to indicate the encoding method. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded.
Windows uses BOM to mark the encoding method of text files.

In addition, the unicode website's
FAQ-BOM
introduces BOM in detail. The official natural authority is only in English, which seems more laborious.
In UTF-8 encoded files, the BOM occupies three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE and switch to the hexadecimal editing state, you can see the FFFE at the beginning. This is a good way to identify UTF-8 encoded files. The software uses BOM to identify whether the file is UTF-8 encoded. Many software also require that the read file must have BOM. However, there are still many software that cannot recognize BOM. When I was studying Firefox, I knew that in early versions of Firefox, extensions could not have BOMs, but versions after Firefox 1.5 have begun to support BOMs. Now I discovered that PHP does not support BOM either.

PHP did not consider the BOM issue when it was designed, which means that it will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file. Because you must convert->UTF-8 to ASCII, or select ASCII encoding in Save As. If it is a line ending in DOS format, you can open it with Notepad, click Save As, and select ASCII encoding. If it contains Chinese characters, you can use UE's save as function and select "UTF-8 without BOM". Please refer to the picture below:


According to Bo-Blog's wiki: Editplus needs to be saved as gb first, and then saved as UTF-8. But be careful when doing this, as all characters not included in the GBK encoding will be lost. If there are some non-Chinese characters in the file, it is better not to use this method. (From this small aspect, UE-UltraEdite-32 is indeed much better than Editplus, Editplus is too lightweight)

In addition, I found a way, which is to use the file editor provided by WordPress. This method is unrestricted and does not require downloading a special editor. After all, everyone is using WordPress. First, open the write permission of the file you want to edit in ftp, then enter the WordPress backend->Management->File Editor, enter the path of the file you want to edit, and click Edit File. In the editing interface that is displayed, you can't see the first three characters, but it doesn't matter. Position the cursor in front of the first character of the entire file and press the Backspace key. OK, click update file, refresh it in ftp, you can see that the file is 3 bytes smaller, and you're done.

Finally, this is a big problem. For those who want to write their own plug-ins, edit other people’s plug-ins for their own use, or need to modify templates (I think everyone needs this), it is best to understand the above. knowledge to avoid being overwhelmed when problems arise.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/328186.htmlTechArticledate.csv: "ID" "NAME" "EMAIL" "1" "Xiao Ming" "xm@163. com" "2" "Xiaodong" "xd@sina.com" "3" "Xiaoshao" "shaozi@hotmai.com" The copy code to read this csv file is as follows: ?php $handle=fo...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template