Mysql character set

What is the character set?

In order to better recognize Chinese, Japanese, English, and Greek. Commonly used symbols are encoded, and this encoding is the character set.

The character set determines how text is stored.

The character set is equivalent to the human language in the computer.

For example:

I speak English, so when I store it, I need to use English text to store it.
If I am talking about Chinese, use English characters to store it. Then people can’t read or understand it, it’s what we call gibberish.

Because there are too many character sets, enough to have dozens or hundreds of them. So we don't need to know too much about character sets, or even how character sets are compiled into human-visible characters.

Key knowledge of character sets

We only need to understand:

  1. Commonly used character sets
  2. What character set do we use in the database

English character set:

Character setDescriptionByte length
ASCIIAmerican Standard Information Interchange CodeSingle Byte
GBKChinese character internal code expansion specificationDouble byte
unicodeUniversal code4 bytes
UTF-8Unicode variable length character encoding1 to 6 bytes

ASCII

ASCII code uses a specified 7-bit or 8-bit binary number combination to represent 128 or 256 possible characters. Standard ASCII code, also called Basic ASCII code, uses 7-bit binary numbers to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation marks, and special control characters used in American English.
Among them:
0~31 and 127 (33 in total) are control characters or special communication characters (the rest are displayable characters), such as control characters: LF (line feed), CR (carriage return), FF ( Page feed), DEL (delete), BS (backspace), BEL (ring), etc.; communication special characters: SOH (head of text), EOT (end of text), ACK (confirmation), etc.; ASCII values ​​are 8, 9 , 10 and 13 are converted to backspace, tab, line feed and carriage return characters respectively. They do not have a specific graphic display, but will have different effects on text display depending on the application.
32~126 (95 in total) are characters (32 is a space), of which 48~57 are ten Arabic numerals from 0 to 9.
65~90 are 26 uppercase English letters, 97~122 are 26 lowercase English letters, and the rest are some punctuation marks, arithmetic symbols, etc.

GBK

GBK is backward compatible with GB 2312 encoding. It is a Chinese character computer encoding specification defined by the People's Republic of China. The earlier version is GB2312.

Unicode

Unicode (Unicode, Universal Code, Unicode) Unicode is a character encoding scheme developed by an international organization that can accommodate all texts and symbols in the world. To meet the requirements of cross-language and cross-platform text conversion and processing.

UTF-8

is a variable-length character encoding for Unicode, and it is also a universal code. Because UNICODE takes up twice as much space as ASCII, and the high byte 0 is of no use to ASCII. In order to solve this problem, some intermediate format character sets have appeared. They are called universal conversion formats, that is, UTF (Universal Transformation Format)

Encoding to be used in actual work

In The commonly used character sets in Chinese are divided into utf-8 and GBK.

The actual ones used are as follows:

Character setDescription
gbk_chinese_ciSimplified Chinese, case-insensitive
utf8_general_ciUnicode (multi-language), case-insensitive

Observe the characteristics of (Figure 1) and you will find that the MySQL character set consists of three parts:
1.Character set
2.Language
3. Type

The last bin refers to the binary character set, and the following ci refers to the case-insensitive characters when storing and sorting.

Notice:
When mysql writes utf-8, it writes utf8. Do not add the middle horizontal line.


(Picture 1)
image

Continuing Learning
||
<?php echo "Hello Mysql"; ?>
submitReset Code