Decoding VARCHAR Lengths and UTF-8 in MySQL
While creating VARCHAR fields in MySQL, a common misconception arises regarding the storage capacity of data. The user may assume that a VARCHAR(32) field in a UTF-8 table can accommodate 32 bytes or 32 characters, whichever is larger. However, the actual interpretation depends on the MySQL version being used.
Version 4 versus Version 5
In MySQL versions prior to 4.1, VARCHAR lengths were measured in bytes. Accordingly, a VARCHAR(32) field could store up to 32 bytes of data. However, in MySQL version 5 and later, VARCHAR lengths are interpreted in character units. Hence, a VARCHAR(32) field in a UTF-8 table can hold up to 32 characters.
Official MySQL Documentation
To clarify this issue, the official MySQL 5 documentation explicitly states:
"MySQL interprets length specifications in character column definitions in character units. (Before MySQL 4.1, column lengths were interpreted in bytes.) This applies to CHAR, VARCHAR, and the TEXT types."
Impact of UTF-8
Additionally, the character set used can influence the max length of a VARCHAR column. For instance, UTF-8 characters can require up to three bytes per character. Therefore, a VARCHAR column using UTF-8 can be declared to be a maximum of 21,844 characters. This is constrained by the maximum row size of 65,535 bytes.
The above is the detailed content of How Does MySQL Interpret VARCHAR Lengths in UTF-8 Tables?. For more information, please follow other related articles on the PHP Chinese website!