The content of this article is to introduce what Unicode and UTF-8 are, so that everyone can understand the difference (difference) between Unicode and UTF-8. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
What is Unicode?
Unicode is a character encoding scheme that uses two bytes to represent each character. Unicode defines a unique number in the range of 0 to 65,535 (216 – 1) for each character and symbol, regardless of platform, program, or language.
What is UTF-8?
UTF-8 is a standard mechanism for converting wide character values to Unicode as a byte stream, an encoding format; can be encoded in 1 to 6 bytes Unicode characters.
The difference between Unicode and UTF-8
Unicode is a character set, while UTF-8 is an encoding rule.
A character set is a list of uniquely numbered characters (these numbers are sometimes called "code points"). To put it simply, each "character" is assigned a unique ID. For example, in the Unicode character set, the digit A is 41.
Encoding rules: It is a rule for converting "code bits" into byte sequences (encoding/decoding can be understood as the process of encryption/decryption). It is an algorithm for converting a list of numbers into binary, so it can Store it on disk.
For example, UTF-8 would translate a sequence of numbers like this: 1, 2, 3, 4:
00000001 00000010 00000011 00000100
Our data is now translated to binary The file can now be saved to disk.
Unicode and UTF-8 relationship diagram:
##Conclusion:
UTF -8 is the encoding used to convert binary data to numbers; Unicode is the character set used to convert numbers to characters. The above is the entire content of this article, I hope it will be helpful to everyone's study. For more related video tutorials, please visit:java tutorial!
The above is the detailed content of What is the difference between Unicode and UTF-8. For more information, please follow other related articles on the PHP Chinese website!