Home Backend Development PHP Tutorial The difference between utf-8 and utf-8 without BOM

The difference between utf-8 and utf-8 without BOM

Aug 08, 2016 am 09:20 AM
ascii bom cookie php

BOM——Byte Order Mark, which is the byte order mark

There is a character called "ZERO WIDTH NO-BREAK SPACE" in UCS encoding, and its encoding is FEFF. FFFE is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but can use the BOM to indicate the encoding method. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded.

In UTF-8 encoded files, the BOM occupies three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE and switch to the hexadecimal editing state, you can see the FFFE at the beginning. This is a good way to identify UTF-8 encoded files. The software uses BOM to identify whether the file is UTF-8 encoded. Many software also require that the read file must have BOM. However, there are still many softwares that cannot recognize BOM.

In early versions of Firefox, extensions could not have BOM, but versions after Firefox 1.5 have begun to support BOM. Now I discovered that PHP does not support BOM either. PHP did not consider the BOM issue when it was designed, which means that it will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file.

Since it must be seen in Bo-Blog's wiki, Bo-Blog, which also uses PHP, is also troubled by BOM. Another trouble was mentioned: "Limited by the COOKIE sending mechanism, in files that already have a BOM at the beginning of these files, the COOKIE cannot be sent (because PHP has already sent the file header before the COOKIE is sent), so the login and logout functions Invalid. All functions that rely on COOKIE and SESSION are invalid. "This should be the reason why a blank page appears in the WordPress background, because any executed file contains a BOM, and these three characters will be sent, resulting in dependence on cookies and The session function is invalid.

The solution is, if it only contains English characters (or characters in ASCII encoding), just save the file in ASCII code. If you use an editor such as UE, click File->Convert->UTF-8 to ASCII, or select ASCII encoding in Save As. If it is a line ending in DOS format, you can open it with Notepad, click Save As, and select ASCII encoding. If it contains Chinese characters, you can use UE's save as function and select "UTF-8 without BOM".

BOM should not be added to utf-8. It has no use except letting the editor know that it is utf-8. In fact, the editor is fully capable of judging the encoding of a file based on characteristics among not too many encoding formats. Even if it cannot be automatically recognized, the editor should have a place to set the encoding. So I think BOM is redundant for utf-8.

Utf-16 only needs to add BOM. Because it is encoded in unicode order, it is two bytes in the BMP range, and it needs to be identified as big or little endian.

Actually, I think it is too stupid to introduce the concept of big and small endianness in utf-8. I don’t know what those standards committees think. The significance of the existence of big and small endianness lies in the processing method of the CPU. If the CPU processes big endian, then for little endian, a layer of conversion must be performed, which brings about a decrease in efficiency. But in practical applications, who cares about endianness? Text encoding gives rise to the concept of byte order. It can only be said that those who formulate standards are too rigid. For UTF-16, I think as long as the whole world follows a byte ordering method, there is no need to use BOM to mark it.

Having said that, PHP does not support UTF-16 encoded files. Because the $ symbol, for example, is also two bytes in UTF-8 and cannot be parsed by the PHP decoder. I don’t know if PHP6 will support this after the concept of unicode is introduced in internal processing.

Encoding problem is something that sounds simple but is actually very complicated. Many programs have the concept of hierarchical coding. Like MySQL, it is divided into concepts such as client->connection->storage and storage->connection->result. Storage is divided into system, database, table, and column. I sometimes think, is it necessary to make it so complicated, TNND. Like MySQL, who uses its features? Unless the two clients are allowed to operate in different encoding environments, there is no need to separate the client encoding. In most cases, just binary in/binary out

The above introduces the difference between utf-8 and utf-8 without BOM, including the relevant content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

7 PHP Functions I Regret I Didn't Know Before 7 PHP Functions I Regret I Didn't Know Before Nov 13, 2024 am 09:42 AM

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

How To Set Up Visual Studio Code (VS Code) for PHP Development How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How do you parse and process HTML/XML in PHP? How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

PHP Program to Count Vowels in a String PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

See all articles