First of all, it is very simple to directly convert the commonly used byte[] to String:
public static void main(String[] args) { String str="我是中国人"; byte[] arr=str.getBytes(); System.out.println("打印:"+arr); for(byte e : arr) { System.out.print(e + " "); } String str2=new String(arr); System.out.println("\n打印2:"+str2); }
java related video recommendations: java learning
For example, the output result of the above is:
打印:[B@15db9742 -50 -46 -54 -57 -42 -48 -71 -6 -56 -53 打印2:我是中国人
You will know the encoding when you see this. byte is one byte, and Chinese characters are two bytes. Therefore, five Chinese characters require ten byte types of digital storage. Then the numbers are turned into Chinese characters, and there is a process of coding standards.
So how does java handle character encoding?
JAVA uses its own String class, and String class objects do not need to specify a coding table! Why does it know what characters each of a bunch of numbers represents? This is because the character information in String is stored in UNICODE encoding. In order to represent characters (note that it is a single character), JAVA also has the data type char, and its size is a fixed length of two 8-digit hexadecimal digits, which is 0~65535. The purpose is to correspond to a character in UNICODE.
If you want to get a UNICODE number in a String, you can use getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
method to get a char[], this char[] represents String characters, numbers encoded according to the UNICODE encoding table.
Why is there garbled code when converting byte[ ] to String?
Obviously, as mentioned above, the coding standards are different. For example, the Chinese word "dang" in the GB2312 standard is represented by two eight-digit numbers 0xB5 and 0xB1. On the English system, there is no GB2312 encoding table. If you give it a 0xB5, 0xB1, it will be treated as ASCII. Put it in Java, and it will process it according to its own UNICODE specification, so if the specifications are different, strange results will appear, that is, garbled characters.
So how do we solve the garbled problem of converting byte[] to String?
It depends on where byte[] comes from. It is often encountered that a picture needs to be converted into byte[] and then converted into a String stream object for transmission to other places. The receiver then converts it into byte[] and then into a picture.
1. If byte[] is transmitted directly, data loss will occur if byte[] is too long. Because not all byte combinations can be mapped to char.
2. Use the common Base64 encoding specification. The encoding specification of base64 is to convert common characters into 6-bit binary representation (64 are commonly used, so it is called base64). How to write, there are ready-made tool classes as follows:
import org.apache.commons.codec.binary.Base64; public class UtilHelper { //base64字符串转byte[] public static byte[] base64String2ByteFun(String base64Str){ return Base64.decodeBase64(base64Str); } //byte[]转base64 public static String byte2Base64StringFun(byte[] b){ return Base64.encodeBase64String(b); } }
This way, the standard conversion between byte[] and String can be guaranteed.
More related articles and tutorials are recommended: Java zero-based introduction
The above is the detailed content of Garbled characters appear when converting byte[] to String in java. For more information, please follow other related articles on the PHP Chinese website!