The difference between utf-8 and utf-8 without BOM-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

The difference between utf-8 and utf-8 without BOM

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 08, 2016 am 09:20 AM

ascii bom cookie php

BOM——Byte Order Mark, which is the byte order mark

There is a character called "ZERO WIDTH NO-BREAK SPACE" in UCS encoding, and its encoding is FEFF. FFFE is a character that does not exist in UCS, so it should not appear in actual transmission. The UCS specification recommends that we transmit the characters "ZERO WIDTH NO-BREAK SPACE" before transmitting the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not require a BOM to indicate the byte order, but can use the BOM to indicate the encoding method. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream starting with EF BB BF, it knows that it is UTF-8 encoded.

In UTF-8 encoded files, the BOM occupies three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE and switch to the hexadecimal editing state, you can see the FFFE at the beginning. This is a good way to identify UTF-8 encoded files. The software uses BOM to identify whether the file is UTF-8 encoded. Many software also require that the read file must have BOM. However, there are still many softwares that cannot recognize BOM.

In early versions of Firefox, extensions could not have BOM, but versions after Firefox 1.5 have begun to support BOM. Now I discovered that PHP does not support BOM either. PHP did not consider the BOM issue when it was designed, which means that it will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file.

Since it must be seen in Bo-Blog's wiki, Bo-Blog, which also uses PHP, is also troubled by BOM. Another trouble was mentioned: "Limited by the COOKIE sending mechanism, in files that already have a BOM at the beginning of these files, the COOKIE cannot be sent (because PHP has already sent the file header before the COOKIE is sent), so the login and logout functions Invalid. All functions that rely on COOKIE and SESSION are invalid. "This should be the reason why a blank page appears in the WordPress background, because any executed file contains a BOM, and these three characters will be sent, resulting in dependence on cookies and The session function is invalid.

The solution is, if it only contains English characters (or characters in ASCII encoding), just save the file in ASCII code. If you use an editor such as UE, click File->Convert->UTF-8 to ASCII, or select ASCII encoding in Save As. If it is a line ending in DOS format, you can open it with Notepad, click Save As, and select ASCII encoding. If it contains Chinese characters, you can use UE's save as function and select "UTF-8 without BOM".

BOM should not be added to utf-8. It has no use except letting the editor know that it is utf-8. In fact, the editor is fully capable of judging the encoding of a file based on characteristics among not too many encoding formats. Even if it cannot be automatically recognized, the editor should have a place to set the encoding. So I think BOM is redundant for utf-8.

Utf-16 only needs to add BOM. Because it is encoded in unicode order, it is two bytes in the BMP range, and it needs to be identified as big or little endian.

Actually, I think it is too stupid to introduce the concept of big and small endianness in utf-8. I don’t know what those standards committees think. The significance of the existence of big and small endianness lies in the processing method of the CPU. If the CPU processes big endian, then for little endian, a layer of conversion must be performed, which brings about a decrease in efficiency. But in practical applications, who cares about endianness? Text encoding gives rise to the concept of byte order. It can only be said that those who formulate standards are too rigid. For UTF-16, I think as long as the whole world follows a byte ordering method, there is no need to use BOM to mark it.

Having said that, PHP does not support UTF-16 encoded files. Because the $ symbol, for example, is also two bytes in UTF-8 and cannot be parsed by the PHP decoder. I don’t know if PHP6 will support this after the concept of unicode is introduced in internal processing.

Encoding problem is something that sounds simple but is actually very complicated. Many programs have the concept of hierarchical coding. Like MySQL, it is divided into concepts such as client->connection->storage and storage->connection->result. Storage is divided into system, database, table, and column. I sometimes think, is it necessary to make it so complicated, TNND. Like MySQL, who uses its features? Unless the two clients are allowed to operate in different encoding environments, there is no need to separate the client encoding. In most cases, just binary in/binary out

The above introduces the difference between utf-8 and utf-8 without BOM, including the relevant content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7629

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

141

Related knowledge

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

7 PHP Functions I Regret I Didn't Know Before Nov 13, 2024 am 09:42 AM

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

See all articles