In-depth Mysql character set settings, in-depth mysql character set
There is a character set converter between the
mysql client and the
mysql server.
character_set_client =>gbk: The converter knows that the encoding sent by the client is in gbk format
character_set_connection=>gbk: Convert the data sent from the client into gbk format
character_set_results =>gbk:
Note: The above three character sets can be set uniformly using set names gbk
example:
create table test(
name varchar(64) NOT NULL
)charset utf8;#utf8 here represents the server-side character encoding
First, insert a piece of data into the data table test
inert into test values('test');
Then, the data "test" is saved in the "utf8" format in the database
process:
First, the data is sent to the Mysql server through the mysql client. When passing through the character set converter, since the character_set_connection value is gbk, the data sent by the client will be converted into gbk format. Then, the character set converter will When the data is to be transmitted to the server, it is found that the server saves the data in utf8, so it will automatically convert the data from gbk to utf8 format internally.
When will garbled characters appear?
Convert the client data into utf8 format through header('Content-type:text/html;charset=utf8'); when the data passes through the "character set converter", because character_set_client=gbk, character_set_connection is also equal to gbk , so the data transmitted from the client (actually in utf8 format) will not be converted.
However, when the character set converter sends the data to the server, it finds that the format required by the server is utf8, so it will process the current data as gbk format and convert it to utf8 (however, this step Actually it's wrong.
2. When the result does not match the client page
Set the format of the returned result to utf8, but the format accepted by the client is gbk, so garbled characters will appear.
All available character sets can be displayed through the show character set syntax
latin character set
Note: The Maxlen column shows the maximum number of bytes used to store a character.
utf8 character set
gbk character set
When will data be lost?
Comparing the above three pictures, we can know that the maximum number of bytes used to store a character is different in each character set, with utf8 being the largest and latin being the smallest. Therefore, if it is not handled properly when passing through the character set converter, data will be lost and it will be irreparable.
for example:
When changing the value of character_set_connection to lantin
The gbk data sent from the client will be converted into lantin1 format, because the data in gbk format takes up more characters, which will cause data loss.
Summary: