Cause:
Inconsistent encoding and decoding results in garbled characters.
First of all, we can get the default character set of the system by calling java.nio.charset.Charset.defaultCharset()
. Chinese Windows systems are all GBK, so the JVM defaults to The GBK character set is used for encoding and decoding.
Related video learning tutorials: java teaching video
The biggest possibility of garbled code is inconsistent encoding and decoding.
// 代码片段1: byte[] read = "你好abc".getBytes(); String result = new String(read); System.out.println(result);
There are 3 steps to appeal this code:
1. Coding. This is written here to make it look more concise. In fact, it is the same as when you read from other media such as files/networks. What type of encoding is the original input stream, and what type of encoding is after reading. The encoding method is not specified here, so the default is GBK.
2. Decoding. Our final operations are all string objects. We can finally obtain a string object by specifying the decoding method of the byte array of the string. The decoding method is not specified here, so the default is GBK.
3. Output and use strings. In fact, there is another encoding and decoding process here, that is, the output stream is encoded into GBK, the console is decoded into GBK, and finally displayed. Because the system default character set is used when outputting and using strings, there is no possibility of inconsistency in encoding and decoding, so this will not be the source of garbled characters.
Suppose we change the encoding of the input stream:
// 代码片段2: byte[] read = "你好abc".getBytes("utf-8"); String result = new String(read); System.out.println(result);
Let’s analyze the three steps:
1. Encoding, utf-8;
2 , decoding, gbk.
The encoding and decoding are inconsistent, resulting in garbled characters.
As for how to fix it, I believe it is already clear.
The repair method is as follows:
// 代码片段3: byte[] read = "你好abc".getBytes("utf-8"); String result = new String(read,"utf-8"); System.out.println(result);
Just change the decoding method to correspond to the encoding.
Summary:
1. It is generally difficult for us to control the encoding of the input stream, so pay attention to the encoding type of the input stream when decoding. We usually see the charset parameter when calling the relevant methods of the input stream, which refers to the decoding method.
2. The output stream can also operate its encoding method. If the output result is subsequently operated on the input stream, such as a file, as long as it is done in the default way, garbled characters will not be generated. We usually see the charset parameter when calling the relevant methods of the output stream, which refers to the encoding method.
Recommended related articles and tutorials: java introductory tutorial
The above is the detailed content of Characters in java are garbled. For more information, please follow other related articles on the PHP Chinese website!