This article comes from the question "Best Practices for Java String Connections?"
There are many ways to connect strings in Java, such as the + operator and the StringBuilder.append method. What are the advantages and disadvantages of each of these methods (the implementation of each method can be appropriately explained Details)?
According to the principle of efficiency, what are the best practices for string concatenation in Java?
What other best practices are there for string processing?
Without further ado, let’s start directly. Environment As follows:
JDK version: 1.8.0_65
CPU: i7 4790
Memory: 16G
Use + splicing directly
Look at the code below:
@Test public void test() { String str1 = "abc"; String str2 = "def"; logger.debug(str1 + str2); }
In the above code, we use the plus sign to connect the four Strings, the advantages of this string splicing method are obvious: the code is simple and intuitive, but compared with StringBuilder and StringBuffer, in most cases it is lower than the latter. Here is the majority of the cases. We use the javap tool to compile the above code. The generated bytecode is decompiled to see what the compiler did to this code.
public void test(); Code: 0: ldc #5 // String abc 2: astore_1 3: ldc #6 // String def 5: astore_2 6: aload_0 7: getfield #4 // Field logger:Lorg/slf4j/Logger; 10: new #7 // class java/lang/StringBuilder 13: dup 14: invokespecial #8 // Method java/lang/StringBuilder."<init>":()V 17: aload_1 18: invokevirtual #9 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 21: aload_2 22: invokevirtual #9 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 25: invokevirtual #10 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 28: invokeinterface #11, 2 // InterfaceMethod org/slf4j/Logger.debug:(Ljava/lang/String;)V 33: return
Judging from the decompilation results, the + operator is actually used to splice strings. The compiler will optimize the code to use the StringBuilder class during the compilation phase, call the append method for string splicing, and finally call the toString method. , does it seem that under normal circumstances, I can actually use + directly, and the compiler will help me optimize it to use StringBuilder anyway?
StringBuilder source code analysis
The answer is naturally no, the reason lies in the internals of the StringBuilder class When something was done.
Let’s take a look at the constructor of the StringBuilder class
public StringBuilder() { super(16); } public StringBuilder(int capacity) { super(capacity); } public StringBuilder(String str) { super(str.length() + 16); append(str); } public StringBuilder(CharSequence seq) { this(seq.length() + 16); append(seq); }
StringBuilder provides 4 default constructors. In addition to the no-argument constructor, it also provides 3 other overloaded versions, and internally calls the super( of the parent class int capacity) construction method, its parent class is AbstractStringBuilder, the construction method is as follows:
AbstractStringBuilder(int capacity) { value = new char[capacity]; }
You can see that StringBuilder actually uses char arrays internally to store data (String, StringBuffer as well), here the value of capacity specifies the array size. Combined with the parameterless constructor of StringBuilder, you can know that the default size is 16 characters.
That is to say, if the total length of the strings to be spliced is not less than 16 characters, then there is not much difference between direct splicing and manually writing StringBuilder. However, we can specify the size of the array by constructing the StringBuilder class ourselves to avoid allocating too many Memory.
Now let’s take a look at what is done inside the StringBuilder.append method:
@Override public StringBuilder append(String str) { super.append(str); return this; }
The append method of the parent class that is directly called:
public AbstractStringBuilder append(String str) { if (str == null) return appendNull(); int len = str.length(); ensureCapacityInternal(count + len); str.getChars(0, len, value, count); count += len; return this; }
The ensureCapacityInternal method is called inside this method. When the total size of the spliced string is greater than When the size of the internal array value is determined, it must be expanded first before splicing. The expansion code is as follows:
void expandCapacity(int minimumCapacity) { int newCapacity = value.length * 2 + 2; if (newCapacity - minimumCapacity < 0) newCapacity = minimumCapacity; if (newCapacity < 0) { if (minimumCapacity < 0) // overflow throw new OutOfMemoryError(); newCapacity = Integer.MAX_VALUE; } value = Arrays.copyOf(value, newCapacity); }
StringBuilder increases the capacity to twice the current capacity + 2 during expansion. This is very scary. If it is constructed If the capacity is not specified, it is very likely that a large amount of memory space will be occupied and wasted after expansion. Secondly, the Arrays.copyOf method is called after the expansion. This method copies the data before expansion to the expanded space. The reason for this is: StringBuilder uses char arrays internally to store data. Java arrays cannot be expanded, so only It can re-apply for a memory space and copy the existing data to the new space. Here it finally calls the System.arraycopy method to copy. This is a native method. The bottom layer directly operates the memory, so it is better than using a loop to copy. There are many blocks. Even so, the impact of applying for a large amount of memory space and copying data cannot be ignored.
Comparison between using + splicing and using StringBuilder
@Test public void test() { String str = ""; for (int i = 0; i < 10000; i++) { str += "asjdkla"; } }
The above code is optimized to be equivalent to:
@Test public void test() { String str = null; for (int i = 0; i < 10000; i++) { str = new StringBuilder().append(str).append("asjdkla").toString(); } }
You can see at a glance that too many StringBuilder objects are created, and str is getting bigger and bigger after each loop. As a result, the memory space requested each time becomes larger and larger, and when the length of str is greater than 16, it must be expanded twice each time! In fact, when the toString method creates the String object, it calls the Arrays.copyOfRange method to copy the data. This This is equivalent to expanding the capacity twice and copying the data three times every time it is executed. This cost is quite high.
public void test() { StringBuilder sb = new StringBuilder("asjdkla".length() * 10000); for (int i = 0; i < 10000; i++) { sb.append("asjdkla"); } String str = sb.toString(); }
The execution time of this code is 0ms (less than 1ms) and 1ms on my machine, while the above code is about 380ms! The difference in efficiency is quite obvious.
The same code above, when adjusting the number of loops to 1,000,000, on my machine, it takes about 20ms when capacity is specified, and about 29ms when capacity is not specified. Although this difference is different from using the + operator directly It has been greatly improved (and the number of cycles has increased by 100 times), but it will still trigger multiple expansions and replications.
Change the above code to use StringBuffer. On my machine, it takes about 33ms. This is because StringBuffer adds the synchronized keyword to most methods to ensure thread safety and execution efficiency to a certain extent. of reduction.
Use String.concat to splice
Now look at this code:
@Test public void test() { String str = ""; for (int i = 0; i < 10000; i++) { str.concat("asjdkla"); } }
这段代码使用了String.concat方法,在我的机器上,执行时间大约为130ms,虽然直接相加要好的多,但是比起使用StringBuilder还要太多了,似乎没什么用。其实并不是,在很多时候,我们只需要连接两个字符串,而不是多个字符串的拼接,这个时候使用String.concat方法比StringBuilder要简洁且效率要高。
public String concat(String str) { int otherLen = str.length(); if (otherLen == 0) { return this; } int len = value.length; char buf[] = Arrays.copyOf(value, len + otherLen); str.getChars(buf, len); return new String(buf, true); }
上面这段是String.concat的源码,在这个方法中,调用了一次Arrays.copyOf,并且指定了len + otherLen,相当于分配了一次内存空间,并分别从str1和str2各复制一次数据。而如果使用StringBuilder并指定capacity,相当于分配一次内存空间,并分别从str1和str2各复制一次数据,最后因为调用了toString方法,又复制了一次数据。
结论
现在根据上面的分析和测试可以知道:
Java中字符串拼接不要直接使用+拼接。
使用StringBuilder或者StringBuffer时,尽可能准确地估算capacity,并在构造时指定,避免内存浪费和频繁的扩容及复制。
在没有线程安全问题时使用StringBuilder, 否则使用StringBuffer。
两个字符串拼接直接调用String.concat性能最好。
关于String的其他最佳实践
用equals时总是把能确定不为空的变量写在左边,如使用"".equals(str)判断空串,避免空指针异常。
第二点是用来排挤第一点的.. 使用str != null && str.length() != 0来判断空串,效率比第一点高。
在需要把其他对象转换为字符串对象时,使用String.valueOf(obj)而不是直接调用obj.toString()方法,因为前者已经对空值进行检测了,不会抛出空指针异常。
使用String.format()方法对字符串进行格式化输出。
在JDK 7及以上版本,可以在switch结构中使用字符串了,所以对于较多的比较,使用switch代替if-else。