Re-understand unicode and utf8 encoding-PHP Tutorial-php.cn

Table of Contents

Re-recognize unicode and utf8 encoding

The relationship between unicode and utf8

Let’s talk about the conversion from UTF-8 to Unicode first

Let’s look at the conversion from Unicode to UTF-8

Let’s talk about the problem

Front-end

Backend

Home

Backend Development

PHP Tutorial

Re-understand unicode and utf8 encoding

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 08, 2016 am 09:23 AM

gt unicode

Re-recognize unicode and utf8 encoding

Until today, just now to be precise, I didn’t know that UTF-8 encoding and Unicode encoding are different, there is a difference囧
There is a certain connection between them Yes, look at their differences:

<code>UTF-8的长度是不一定的，有可能是1、2、3字节
Unicode长度一定，2个字节（USC-2）
UTF-8可以和Unicode互相转换</code>

Copy after login

The relationship between unicode and utf8

Unicode (16)	UTF-8 (binary)
0000 - 007F	0xxxxxxx
0080 - 07FF	110xxxxx 10xxxxxx
0800 - FFFF	1110xxxx 10xxxxxx 10xxxxxx

The above table has two meanings. The first one is obviously Unicode Correspondence to UTF-8 character range , there is another way to see how Unicode and UTF-8 can be converted to each other:

Let’s talk about the conversion from UTF-8 to Unicode first

UTF-8 encoded binary is matched with the above three formats, and the fixed bits are removed after matching (Non-x position in the table), and then every 8-bit group from right to left. If there are not enough 8-bits, the left side will not be collected. Make up 2 bytes and 16 bits. These 16 bits represent the Unicode encoding corresponding to UTF-8. , take a look at the following examples:
Re-understand unicode and utf8 encoding
The text encoding format in the above picture is UTF-8, and you can use WinHex to see its hexadecimal representation

<code>字符	=> UTF-8	  => UTF-8二进制=> 去掉固定位置凑够16位的二进制 => 16进制

汉 	=> E6B189 => 11100110 10110001 10001001	=> 01101100 01001001 => 6C49
汉 	=> E5AD97 => 11100101 10101101 10010111	=> 01011011	01010111 => 5B57

#下面是在chrome命令行下面运行的结果
'\u6C49'
"汉"
'\u5B57'
"字"

#到这里的话，从UTF-8转换到Unicode已经是一件非常容易的事了，看看转换的伪代码
读取一个字节，11100110
判断该UTF-8字符的格式，属于第三种，3个字节
继续读取2个字节得到 11100101 10101101 10010111
按照格式去掉固定位     1011011 01010111
不够16位，左边补零    01011011 01010111  => 5B57</code>

Copy after login

Let’s look at the conversion from Unicode to UTF-8

<code>5B57
获取5B57所在的Unicode范围，0800 <= 5B57 <= FFFF，得知5B57的UTF-8有三个字节，形式为1110xxxx 10xxxxxx 10xxxxxx
获取5B57的二进制编码 101101101010111
用上一步骤的二进制编码从右至左拼接UTF-8编码 11100101 10101101 10010111 </code>

Copy after login

Let’s talk about the problem

Let’s talk about the cause of today’s problem. Many words are input from the front end. Each word in UTF-8 format has a maximum of 30 bytes, so verification will be done on the front end and backend respectively. JavaScript uses Unicode encoding. The back-end program uses UTF-8 encoding, and the current solution is as follows

Front-end

<code>function utf8_bytes(str)
{
	var len = 0, unicode;
	for(var i = 0; i < str.length; i++)
	{
		unicode = str.charCodeAt(i);
		if(unicode < 0x0080) {
			++len;
		} else if(unicode < 0x0800) {
			len += 2;
		} else if(unicode <= 0xFFFF) {
			len += 3;
		}else {
			throw "characters must be USC-2!!"
		}
	}
	return len;
}

#例子
utf8_bytes('asdasdas')
8
utf8_bytes('yrt燕睿涛')
12</code>

Copy after login

Backend

<code>#对于GBK字符串
$len = ceil(strlen(bin2hex(iconv('GBK', 'UTF-8', $word)))/2);
#对于UTF8字符串
$len = ceil(strlen(bin2hex($word))/2);</code>

Copy after login

5/21/2015 8:21:53 PM

The copyright of this article belongs to the author iforever(luluyrt＠163 .com), any form of reprinting is prohibited without the consent of the author. After reprinting the article, a link to the author and the original text must be provided in an obvious position on the article page, otherwise we reserve the right to pursue legal liability.

The above has introduced the re-understanding of unicode and utf8 encoding, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Will R.E.P.O. Have Crossplay?

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7549

CakePHP Tutorial

1382

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

What are the differences between Huawei GT3 Pro and GT4? Dec 29, 2023 pm 02:27 PM

Many users will choose the Huawei brand when choosing smart watches. Among them, Huawei GT3pro and GT4 are very popular choices. Many users are curious about the difference between Huawei GT3pro and GT4. Let’s introduce the two to you. . What are the differences between Huawei GT3pro and GT4? 1. Appearance GT4: 46mm and 41mm, the material is glass mirror + stainless steel body + high-resolution fiber back shell. GT3pro: 46.6mm and 42.9mm, the material is sapphire glass + titanium body/ceramic body + ceramic back shell 2. Healthy GT4: Using the latest Huawei Truseen5.5+ algorithm, the results will be more accurate. GT3pro: Added ECG electrocardiogram and blood vessel and safety

Fix: Snipping tool not working in Windows 11 Aug 24, 2023 am 09:48 AM

Why Snipping Tool Not Working on Windows 11 Understanding the root cause of the problem can help find the right solution. Here are the top reasons why the Snipping Tool might not be working properly: Focus Assistant is On: This prevents the Snipping Tool from opening. Corrupted application: If the snipping tool crashes on launch, it might be corrupted. Outdated graphics drivers: Incompatible drivers may interfere with the snipping tool. Interference from other applications: Other running applications may conflict with the Snipping Tool. Certificate has expired: An error during the upgrade process may cause this issu simple solution. These are suitable for most users and do not require any special technical knowledge. 1. Update Windows and Microsoft Store apps

In-depth understanding of PHP: Implementation method of converting JSON Unicode to Chinese Mar 05, 2024 pm 02:48 PM

In-depth understanding of PHP: Implementation method of converting JSONUnicode to Chinese During development, we often encounter situations where we need to process JSON data, and Unicode encoding in JSON will cause us some problems in some scenarios, especially when Unicode needs to be converted When encoding is converted to Chinese characters. In PHP, there are some methods that can help us achieve this conversion process. A common method will be introduced below and specific code examples will be provided. First, let us first understand the Un in JSON

How to convert unicode to Chinese Dec 14, 2023 am 10:57 AM

Unicode is a character encoding standard used to represent various languages and symbols. To convert Unicode encoding to Chinese characters, you can use Python's built-in functions chr() and ord().

How to Fix Can't Connect to App Store Error on iPhone Jul 29, 2023 am 08:22 AM

Part 1: Initial Troubleshooting Steps Checking Apple’s System Status: Before delving into complex solutions, let’s start with the basics. The problem may not lie with your device; Apple's servers may be down. Visit Apple's System Status page to see if the AppStore is working properly. If there's a problem, all you can do is wait for Apple to fix it. Check your internet connection: Make sure you have a stable internet connection as the "Unable to connect to AppStore" issue can sometimes be attributed to a poor connection. Try switching between Wi-Fi and mobile data or resetting network settings (General > Reset > Reset Network Settings > Settings). Update your iOS version:

Try the method to solve the problem of Chinese garbled characters in Eclipse Jan 03, 2024 pm 05:28 PM

Are you troubled by Chinese garbled characters in Eclipse? To try these solutions, you need specific code examples 1. Background introduction With the continuous development of computer technology, Chinese plays an increasingly important role in software development. However, many developers encounter garbled code problems when using Eclipse for Chinese development, which affects work efficiency. Then, this article will introduce some common garbled code problems and give corresponding solutions and code examples to help readers solve the Chinese garbled code problem in Eclipse. 2. Common garbled code problems and solution files

PHP Tutorial: How to Convert JSON Unicode to Chinese Characters Mar 05, 2024 pm 06:36 PM

JSON (JavaScriptObjectNotation) is a lightweight data exchange format commonly used for data exchange between web applications. When processing JSON data, we often encounter Unicode-encoded Chinese characters (such as "u4e2du6587") and need to convert them into readable Chinese characters. In PHP, we can achieve this conversion through some simple methods. Next, we will detail how to convert JSONUnico

php提交表单通过后,弹出的对话框怎样在当前页弹出,该如何解决 Jun 13, 2016 am 10:23 AM

php提交表单通过后,弹出的对话框怎样在当前页弹出php提交表单通过后,弹出的对话框怎样在当前页弹出而不是在空白页弹出？想实现这样的效果：而不是空白页弹出:------解决方案--------------------如果你的验证用PHP在后端，那么就用Ajax；仅供参考：HTML code

See all articles