Difference: 1. ASCII encoding is 1 byte, while Unicode encoding is usually 2 bytes. 2. ASCII is a single-byte encoding and cannot be used to represent Chinese; Unicode can represent all languages. 3. Unicode encoding requires twice as much storage space as ASCII encoding.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
ASCII encoding
- ASCII code uses specified 7-bit or 8-bit binary number combinations to represent 128 or 256 possible character. Standard ASCII code, also called basic ASCII code, uses 7 binary digits (the remaining 1 binary digit is 0) to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation marks, and special controls used in American English. character. The last bit is used for parity checking.
- Problem: ASCII is a single-byte encoding and cannot be used to represent Chinese (Chinese encoding requires at least 2 bytes). Therefore, China has formulated the GB2312 encoding to encode Chinese. But there are many different languages in the world, so a unified encoding is needed.
Unicode
- Unicode unifies all languages into one set of codes, so there will be no more garbled characters Problem.
- Unicode most commonly uses two bytes to represent a character (if you want to use very remote characters, you need 4 bytes). Modern operating systems and most programming languages support Unicode directly.
The difference between Unicode and ASCII
- ASCII encoding is 1 byte, while Unicode encoding is usually 2 byte.
The ASCII encoding of the letter A is 65 in decimal and 01000001 in binary; in Unicode, you only need to add 0 in front, which is: 00000000 01000001.
- New problem: If unified into Unicode encoding, the problem of garbled characters will disappear. However, if the text you write is basically all in English, Unicode encoding requires twice as much storage space as ASCII encoding, which is very uneconomical in terms of storage and transmission.
UTF8
- So, in the spirit of conservation, there is another idea to convert Unicode encoding into "variable long encoding" UTF-8 encoding.
- UTF-8 encoding encodes a Unicode character into 1-6 bytes according to different number sizes. Commonly used English letters are encoded into 1 byte. Chinese characters are usually 3 bytes, and only a few Uncommon characters will be encoded into 4-6 bytes. If the text you want to transmit contains a lot of English characters, using UTF-8 encoding can save space.
Characters |
ASCII |
Unicode |
UTF-8 |
A |
01000001 |
00000000 01000001 |
01000001 |
中文 |
x |
01001110 00101101 |
01001110 00101101 |
##It can also be found from the above table that UTF-8 encoding has an additional The advantage is that ASCII encoding can actually be regarded as part of UTF-8 encoding. Therefore, a large number of historical legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding. -
How common character encoding works in computers
In computer memory, Unicode encoding is uniformly used. When it needs to be saved to the hard disk or needs to be transferred, it is converted. Encoded to UTF-8. -
- When editing with Notepad, the UTF-8 characters read from the file are converted into Unicode characters and stored in the memory. After the editing is completed, Unicode is converted into UTF-8 when saving. To the file:
- When browsing the web, the server will convert the dynamically generated Unicode content into UTF-8 and then transmit it to the browser:
For more related knowledge, please visit the
FAQ column!
The above is the detailed content of What is the difference between unicode and ascii. For more information, please follow other related articles on the PHP Chinese website!