Home > Common Problem > What does collate in mysql mean?

What does collate in mysql mean?

百草
Release: 2023-07-11 14:32:31
Original
3340 people have browsed it

Mysql's collate means a collation set, which can be understood as a sorting rule. In MySQL, stored character data can have different collations, depending on the character set and collate rules used. The character set determines the types of characters that can be stored, and the collate rules determine how those characters are sorted.

What does collate in mysql mean?

The operating system of this tutorial: Windows 10 system, mysql version 8.0, Dell G3 computer.

mysql COLLATE means proofreading set, which can be understood as sorting rules, etc.

In MySQL, stored character data can have different collation rules, depending on the character set and collate rules used. The character set determines the types of characters that can be stored, and the collate rules determine how these characters are sorted.

Normally, when creating a database or table, you can set the sorting method of character data by specifying different character sets and collate rules. If no specific collate rule is specified, MySQL will use the default collate rule. Common character sets and collate rules include utf8mb4_general_ci (case-insensitive, accent-insensitive), utf8mb4_unicode_ci (case-insensitive, accent-sensitive), etc.

The selection of collate rules is determined based on specific needs. Different collate rules are suitable for different scenarios. For example, when searching and sorting data, some collate rules do case-sensitive comparisons, while others do not. Therefore, when designing a database, appropriate collate rules should be selected based on actual needs and business rules.

Various sorting methods can be achieved by using different collate rules. For example, character data can be sorted case-insensitively, meaning "A" and "a" will be considered equal. In contrast, if case-sensitive collate rules were used, then "A" and "a" would be treated as different characters.

In addition, collate rules can also affect string comparison operations. Under different collate rules, the comparison results between characters may be different. For example, under some collate rules, the letter "a" may be considered larger than the letter "Z", while under other rules the opposite is true.

It should be noted that the collate rules apply not only to the sorting and comparison of character data, but also to the results when string functions and operators are used in queries. Therefore, when writing SQL queries, you should consider the collate rules used to ensure you get the expected results.

Execute the show create table command in mysql, and you can see the table creation statement of a table. The example is as follows:

CREATE TABLE `table1` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    `field1` text COLLATE utf8_unicode_ci NOT NULL COMMENT '字段1',
    `field2` varchar(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT '字段2',
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci;
Copy after login

We can understand most fields, but Today we are going to discuss the COLLATE keyword. What does the corresponding utf8_unicode_ci behind this value mean? If you use this question to take the DBA exam during the interview, it should be able to stump most people.

What is COLLATE used for?

Developers using phpmyadmin may look very familiar, because the Chinese header has already given the answer:

What does collate in mysql mean?

##The so-called utf8_unicode_ci , is actually a rule used for sorting. For those character type columns in mysql, such as VARCHAR, CHAR, and TEXT type columns, a COLLATE type is required to tell mysql how to sort and compare the column. In short, COLLATE will affect the order of the ORDER BY statement, the results filtered out by the greater than or less sign in the WHERE condition, and the **DISTINCT**, **GROUP BY**, and **HAVING** statements. query results. In addition, when MySQL builds an index, if the index column is of character type, it will also affect index creation, but we cannot perceive this impact. In short, wherever character type comparison or sorting is involved, it will be related to COLLATE.

The difference between various COLLATE

COLLATE is usually related to data encoding (CHARSET). Generally speaking, each CHARSET has multiple COLLATE is supported, and each CHARSET specifies a COLLATE as the default value. For example, the default COLLATE for Latin1 encoding is latin1_swedish_ci, the default COLLATE for GBK encoding is gbk_chinese_ci, and the default value for utf8mb4 encoding is utf8mb4_general_ci.

Here is a digression. There are two encodings in mysql: utf8 and utf8mb4. In mysql, please forget **utf8** and always use **utf8mb4**. This is a legacy issue of MySQL. UTF8 in MySQL can only support character encodings with a maximum length of 3 bytes. For some text that needs to occupy 4 bytes, MySQL's UTF8 does not support it. You must use utf8mb4.

Many COLLATEs have the word _ci, which is the abbreviation of Case Insensitive, which means that "A" and "a" are treated equally when sorting and comparing. selection * from table1 where field1="a" can also select the value of field1 as "A". At the same time, for those COLLATEs with the _cs suffix, it is Case Sensitive, that is, case-sensitive.

在mysql中使用show collation指令可以查看到mysql所支持的所有COLLATE。以utf8mb4为例,该编码所支持的所有COLLATE如下图所示。

What does collate in mysql mean?

imgmysql中和utf8mb4相关的所有COLLATE

图中我们能看到很多国家的语言自己的排序规则。在国内比较常用的是utf8mb4_general_ci(默认)、utf8mb4_unicode_ci、utf8mb4_bin这三个。我们来探究一下这三个的区别:

首先utf8mb4_bin的比较方法其实就是直接将所有字符看作二进制串,然后从最高位往最低位比对。所以很显然它是区分大小写的。

而utf8mb4_unicode_ci和utf8mb4_general_ci对于中文和英文来说,其实是没有任何区别的。对于我们开发的国内使用的系统来说,随便选哪个都行。只是对于某些西方国家的字母来说,utf8mb4_unicode_ci会比utf8mb4_general_ci更符合他们的语言习惯一些,general是mysql一个比较老的标准了。例如,德语字母“ß”,在utf8mb4_unicode_ci中是等价于"ss"两个字母的(这是符合德国人习惯的做法),而在utf8mb4_general_ci中,它却和字母“s”等价。不过,这两种编码的那些微小的区别,对于正常的开发来说,很难感知到。本身我们也很少直接用文字字段去排序,退一步说,即使这个字母排错了一两个,真的能给系统带来灾难性后果么?从网上找的各种帖子讨论来说,更多人推荐使用utf8mb4_unicode_ci,但是对于使用了默认值的系统,也并没有非常排斥,并不认为有什么大问题。结论:推荐使用utf8mb4_unicode_ci,对于已经用了utf8mb4_general_ci的系统,也没有必要花时间改造。

另外需要注意的一点是,从mysql 8.0开始,mysql默认的CHARSET已经不再是Latin1了,改为了utf8mb4(参考链接),并且默认的COLLATE也改为了utf8mb4_0900_ai_ci。utf8mb4_0900_ai_ci大体上就是unicode的进一步细分,0900指代unicode比较算法的编号( Unicode Collation Algorithm version),ai表示accent insensitive(发音无关),例如e, è, é, ê 和 ë是一视同仁的。相关参考链接1,相关参考链接2

COLLATE设置级别及其优先级

设置COLLATE可以在示例级别、库级别、表级别、列级别、以及SQL指定。实例级别的COLLATE设置就是mysql配置文件或启动指令中的collation_connection系统变量。

库级别设置COLLATE的语句如下:

CREATE DATABASE DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
如果库级别没有设置CHARSET和COLLATE,则库级别默认的CHARSET和COLLATE使用实例级别的设置。在mysql8.0以下版本中,你如果什么都不修改,默认的CHARSET是Latin1,默认的COLLATE是latin1_swedish_ci。从mysql8.0开始,默认的CHARSET已经改为了utf8mb4,默认的COLLATE改为了utf8mb4_0900_ai_ci。

表级别的COLLATE设置,则是在CREATE TABLE的时候加上相关设置语句,例如:

CREATE TABLE (
 
……
 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Copy after login

如果表级别没有设置CHARSET和COLLATE,则表级别会继承库级别的CHARSET与COLLATE。

列级别的设置,则在CREATE TABLE中声明列的时候指定,例如

CREATE TABLE (
 
`field1` VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
 
……
 
) ……
Copy after login

如果列级别没有设置CHARSET和COLATE,则列级别会继承表级别的CHARSET与COLLATE。

最后,你也可以在写SQL查询的时候显示声明COLLATE来覆盖任何库表列的COLLATE设置,不太常用,了解即可:

SELECT DISTINCT field1 COLLATE utf8mb4_general_ci FROM table1;
 
SELECT field1, field2 FROM table1 ORDER BY field1 COLLATE utf8mb4_unicode_ci;
Copy after login

如果全都显示设置了,那么优先级顺序是 SQL语句 > 列级别设置 > 表级别设置 > 库级别设置 > 实例级别设置。也就是说列上所指定的COLLATE可以覆盖表上指定的COLLATE,表上指定的COLLATE可以覆盖库级别的COLLATE。如果没有指定,则继承下一级的设置。即列上面没有指定COLLATE,则该列的COLLATE和表上设置的一样。

以上就是关于mysql的COLLATE相关知识。不过,在系统设计中,我们还是要尽量避免让系统严重依赖中文字段的排序结果,在mysql的查询中也应该尽量避免使用中文做查询条件。

The above is the detailed content of What does collate in mysql mean?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template