Mysql character set

What is the character set?

In order to better recognize Chinese, Japanese, English, and Greek. Commonly used symbols are encoded, and this encoding is the character set.

The character set determines how text is stored.

The character set is equivalent to the human language in the computer.

For example:

I speak English, so when I store it, I need to use English text to store it.
If I am talking about Chinese, use English characters to store it. Then people can’t read or understand it, it’s what we call gibberish.

Because there are too many character sets, enough to have dozens or hundreds of them. So we don't need to know too much about character sets, or even how character sets are compiled into human-visible characters.

Key knowledge of character sets

We only need to understand:

Commonly used character sets
What character set do we use in the database

English character set:

Character set	Description	Byte length
ASCII	American Standard Information Interchange Code	Single Byte
GBK	Chinese character internal code expansion specification	Double byte
unicode	Universal code	4 bytes
UTF-8	Unicode variable length character encoding	1 to 6 bytes

ASCII

ASCII code uses a specified 7-bit or 8-bit binary number combination to represent 128 or 256 possible characters. Standard ASCII code, also called Basic ASCII code, uses 7-bit binary numbers to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation marks, and special control characters used in American English.
Among them:
0~31 and 127 (33 in total) are control characters or special communication characters (the rest are displayable characters), such as control characters: LF (line feed), CR (carriage return), FF ( Page feed), DEL (delete), BS (backspace), BEL (ring), etc.; communication special characters: SOH (head of text), EOT (end of text), ACK (confirmation), etc.; ASCII values are 8, 9 , 10 and 13 are converted to backspace, tab, line feed and carriage return characters respectively. They do not have a specific graphic display, but will have different effects on text display depending on the application.
32~126 (95 in total) are characters (32 is a space), of which 48~57 are ten Arabic numerals from 0 to 9.
65~90 are 26 uppercase English letters, 97~122 are 26 lowercase English letters, and the rest are some punctuation marks, arithmetic symbols, etc.

GBK

GBK is backward compatible with GB 2312 encoding. It is a Chinese character computer encoding specification defined by the People's Republic of China. The earlier version is GB2312.

Unicode

Unicode (Unicode, Universal Code, Unicode) Unicode is a character encoding scheme developed by an international organization that can accommodate all texts and symbols in the world. To meet the requirements of cross-language and cross-platform text conversion and processing.

UTF-8

is a variable-length character encoding for Unicode, and it is also a universal code. Because UNICODE takes up twice as much space as ASCII, and the high byte 0 is of no use to ASCII. In order to solve this problem, some intermediate format character sets have appeared. They are called universal conversion formats, that is, UTF (Universal Transformation Format)