Java only uses Unicode encoding, so char can store Chinese characters. What is Unicode? Unicode (Chinese: Universal Code, International Code, Unicode, Unicode) is an industry standard in the field of computer science. It organizes and codes most of the writing systems in the world, allowing computers to present and process text in a simpler way.
Unicode developed with the standard of the universal character set and was also published in the form of a book [1]. Unicode is still being continuously revised to this day, with each new version adding more new characters. The latest version is 8.0.0 [1] released on June 17, 2015, which has received more than 100,000 characters (the 100,000 characters were adopted in 2005). In addition to visual glyphs, encoding methods, and standard character encodings, the data covered by Unicode also includes character characteristics, such as upper and lower case letters. The above comes from Wikipedia unicode-Wikipedia
It is not difficult to see from the above that the things in Unicode are not free and need to be included by the Unicode organization. However, now only some Chinese, Japanese and Korean characters are included, and they may not be complete. And Java uses Unicode, so as long as Unicode Organizations that include Java will support these characters. Not a very good answer.
Char is stored using 2 bytes, because 2 bytes for characters + punctuation are more than enough to represent characters, but if you add other non-English text, Chinese, etc., it may not be enough. What if 4 bytes are used to represent one character? , the range that can be expressed will be expanded, and 8 bytes is theoretically possible The Unicode character set standard came into being
Java only uses Unicode encoding, so char can store Chinese characters. What is Unicode?
Unicode (Chinese: Universal Code, International Code, Unicode, Unicode) is an industry standard in the field of computer science. It organizes and codes most of the writing systems in the world, allowing computers to present and process text in a simpler way.
Unicode developed with the standard of the universal character set and was also published in the form of a book [1]. Unicode is still being continuously revised to this day, with each new version adding more new characters. The latest version is 8.0.0 [1] released on June 17, 2015, which has received more than 100,000 characters (the 100,000 characters were adopted in 2005). In addition to visual glyphs, encoding methods, and standard character encodings, the data covered by Unicode also includes character characteristics, such as upper and lower case letters.
The above comes from Wikipedia unicode-Wikipedia
It is not difficult to see from the above that the things in Unicode are not free and need to be included by the Unicode organization. However, now only some Chinese, Japanese and Korean characters are included, and they may not be complete. And Java uses Unicode, so as long as Unicode Organizations that include Java will support these characters.
Not a very good answer.
When utf-8 isn’t enough, there’s utf-16
Char is stored using 2 bytes, because 2 bytes for characters + punctuation are more than enough to represent characters, but if you add other non-English text, Chinese, etc., it may not be enough. What if 4 bytes are used to represent one character? , the range that can be expressed will be expanded, and 8 bytes is theoretically possible
The Unicode character set standard came into being
Characters in java use Unicode encoding, 16 bits