84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
查了一大堆,糊里糊涂的。现在知道以下几点:
java采用unicode编码,占两个字节,编码范围是0x0000到0xFFFF,但一共才65536个数,怎么可能表达世界上所有语言文字呢,于是又冒出来一个增补码,占4个字节,范围是0x100000到0x10FFFF。那么现在假设有一个中文,范围超出了65535,那么它到底是怎么编码的,占几个字节,是几个char?
业精于勤,荒于嬉;行成于思,毁于随。
public static void main(String[] args) throws Exception{ System.out.println("
Usually we will set the encoding format to UTF-8. In Chinese, one Chinese character represents two characters in Java, 3 bytes
public static void main(String[] args) { String str = "测试"; System.out.println(str.getBytes().length); }
Output: 6For the number of bytes occupied by different encoding formats, please refer to the blog: The number of bytes occupied by different encoding formats
Three UTF-8Two GBK
Usually we will set the encoding format to UTF-8. In Chinese, one Chinese character represents two characters in Java, 3 bytes
Output: 6
For the number of bytes occupied by different encoding formats, please refer to the blog:
The number of bytes occupied by different encoding formats
Three UTF-8
Two GBK