[Introduction] String intern in Java 6,7,8; String Pool This article will discuss how the String intern method is implemented in Java 6, and how this method works in Java 7 and What adjustments have been made in Java 8. String Pool String Pool (Named String Normalization)
This parameter doesn’t help much in Java 6, because you’re still limited to the fixed PermGen memory size. The subsequent discussion will directly ignore Java 6
You must set a larger -XX:StringTalbeSize
value (compared to the default 1009), if you want to Use String.intern() more often - otherwise this method will quickly reduce to 0 (pool size).
This article will discuss how to implement String.intern
in Java 6 method, and what adjustments have been made to this method in Java 7 and Java 8.
String pooling (named string normalization) is done by using a unique shared String
object to use the same value differently The address represents the string process. You can use your own <a href="http://www.php.cn/code/8210.html" target="_blank">Map</a><String, String>
(use weak reference or soft reference as needed) and use the value in the map as the standard value To achieve this goal, or you can use String.intern()
provided by the JDK.
Many standards prohibit the use of String.intern()
because if the pool is used frequently, it will be controlled by the city and there is a high probability of triggering OutOfMemory<a href="http://www.php.cn/wiki/265.html" target="_blank">Exception</a>
. Oracle Java 7 has made many improvements to the string pool. You can learn about it at the following addresses bugs.sun.com/view_bug.do?bug_id=6962931 and bugs.sun.com/view_bug.do?bug_id=6962930
In the good old days all shared String objects were stored in PermGen - a fixed-size portion of the heap mainly used to store loaded class objects and string pools . In addition to explicitly shared strings, the PermGen string pool also contains all strings used in the program (note here that it is a used string, if the class or method is never loaded or used, any # defined in it ##Constant will not be loaded)
The biggest problem with the string pool in Java 6 is its location - PermGen. The size of PermGen is fixed and cannot be expanded at runtime. You can resize it using the-XX:MaxPermSize=N configuration. As far as I know, the default PermGen size ranges from 32M to 96M for different platforms. You can expand its size, but the size usage is fixed. This limitation requires you to be very careful when using
String.intern — you'd better not use this method to intern any uncontrollable user input. This is why manual management
Map is mostly used in JAVA6 to implement string pool
HashMap Each element contains the same hash A string list of values. Some implementation details can be obtained from the Java bug report bugs.sun.com/view_bug.do?bug_id=6962930
The default pool size is 1009 (appears in the source code of the bug report mentioned above and was added in Java7u40). This was a constant in early versions of JAVA 6 and was later adjusted to be configurable in java6u30 to java6u41. In java 7, it is configurable from the beginning (at least it is configurable in java7u02). You need to specify the parameter -XX:StringTableSize=N
, N is the size of the string pool Map
. Make sure it's a pre-sized size for performance tuning.
In Java 6 this parameter is not of much help as you are still limited to the fixed PermGen memory size. The following discussion will directly ignore Java 6
In Java7, in other words, you are limited to a larger heap memory middle. This means that you can pre-set the size of the String pool (this value depends on your application needs). Generally speaking, once a program starts to consume memory, the memory grows by hundreds of megabytes. In this case, it seems more appropriate to allocate 8-16M of memory to a string pool with 1 million string objects ( Do not use 1,000,000 as the value of -XX:StringTaleSize
– it is not a prime number; use 1,000,003
instead)
You might expect about the allocation of Strings in Maps – you can read My previous experience with HashCode method tuning.
You must set a larger
-XX:StringTalbeSize
value (compared to the default 1009) if you wish to use String.intern() more — Otherwise this method will quickly reduce it to 0 (pool size).
I didn't notice the dependency when interning strings smaller than 100 characters (I think a string containing 50 repeated characters is not similar to real data, so 100 characters seems to be a good test limit)
Here are the application logs for the default pool size: the first column is the number of strings that have been interned, the second column interns 10,000 strings all the time (Seconds)
0; time = 0.0 sec 50000; time = 0.03 sec 100000; time = 0.073 sec 150000; time = 0.13 sec 200000; time = 0.196 sec 250000; time = 0.279 sec 300000; time = 0.376 sec 350000; time = 0.471 sec 400000; time = 0.574 sec 450000; time = 0.666 sec 500000; time = 0.755 sec 550000; time = 0.854 sec 600000; time = 0.916 sec 650000; time = 1.006 sec 700000; time = 1.095 sec 750000; time = 1.273 sec 800000; time = 1.248 sec 850000; time = 1.446 sec 900000; time = 1.585 sec 950000; time = 1.635 sec 1000000; time = 1.913 sec
Testing was conducted on a Core i5-3317U@1.7Ghz CPU device. You can see that it grows linearly, and with the JVM string pool containing a million strings, I can still intern approximately 5000 strings per second, which is good for an application that handles a lot of data in memory too slow.
Now, adjust the -XX:StringTableSize=100003
parameters to re-run the test:
50000; time = 0.017 sec 100000; time = 0.009 sec 150000; time = 0.01 sec 200000; time = 0.009 sec 250000; time = 0.007 sec 300000; time = 0.008 sec 350000; time = 0.009 sec 400000; time = 0.009 sec 450000; time = 0.01 sec 500000; time = 0.013 sec 550000; time = 0.011 sec 600000; time = 0.012 sec 650000; time = 0.015 sec 700000; time = 0.015 sec 750000; time = 0.01 sec 800000; time = 0.01 sec 850000; time = 0.011 sec 900000; time = 0.011 sec 950000; time = 0.012 sec 1000000; time = 0.012 sec
As you can see, the time to insert the string is approximately constant (in The average number of strings in the string list of the Map does not exceed 10). Here are the results of the same settings, but this time we will insert 10 million strings into the pool (this means that the average number of strings in the Map's string list contains 100 strings)
2000000; time = 0.024 sec 3000000; time = 0.028 sec 4000000; time = 0.053 sec 5000000; time = 0.051 sec 6000000; time = 0.034 sec 7000000; time = 0.041 sec 8000000; time = 0.089 sec 9000000; time = 0.111 sec 10000000; time = 0.123 sec
Now let’s increase the eat size to 1 million (1,000,003 to be precise)
1000000; time = 0.005 sec 2000000; time = 0.005 sec 3000000; time = 0.005 sec 4000000; time = 0.004 sec 5000000; time = 0.004 sec 6000000; time = 0.009 sec 7000000; time = 0.01 sec 8000000; time = 0.009 sec 9000000; time = 0.009 sec 10000000; time = 0.009 sec
As you can see, the times are very even and consistent with The "0 to 1 million" tables don't make much difference. Even with a large enough pool size, my notebook is adding 1,000,000 character objects per second.
Now we need to compare the JVM string pool with WeakHashMap<String, WeakReference<String>>
It can be used to simulate the JVM string pool. The following method is used to replace String.intern
:
private static final WeakHashMap<String, WeakReference<String>> s_manualCache = new WeakHashMap<String, WeakReference<String>>( 100000 ); private static String manualIntern( final String str ) { final WeakReference<String> cached = s_manualCache.get( str ); if ( cached != null ) { final String value = cached.get(); if ( value != null ) return value; } s_manualCache.put( str, new WeakReference<String>( str ) ); return str; }
The same test below for a manual pool:
0; manual time = 0.001 sec 50000; manual time = 0.03 sec 100000; manual time = 0.034 sec 150000; manual time = 0.008 sec 200000; manual time = 0.019 sec 250000; manual time = 0.011 sec 300000; manual time = 0.011 sec 350000; manual time = 0.008 sec 400000; manual time = 0.027 sec 450000; manual time = 0.008 sec 500000; manual time = 0.009 sec 550000; manual time = 0.008 sec 600000; manual time = 0.008 sec 650000; manual time = 0.008 sec 700000; manual time = 0.008 sec 750000; manual time = 0.011 sec 800000; manual time = 0.007 sec 850000; manual time = 0.008 sec 900000; manual time = 0.008 sec 950000; manual time = 0.008 sec 1000000; manual time = 0.008 sec
A manually written pool provides for good performance. Unfortunately though, my test (String.valueOf(0 < N < 1,000,000,000)
) keeps very short strings, which it allows when using the -Xmx1280M
parameter I keep 2.5M of these strings per month. The JVM string pool (size=1,000,003) on the other hand provides the same performance characteristics when the JVM memory is sufficient, until the JVM string pool contains 12.72M strings and consumes all the memory (5 times more). In my opinion, it's well worth removing all manual string pooling from your application.
Java7u40 version extends the string pool size (this is a required performance update) to 60013. This value allows you to The pool contains approximately 30,000 unique strings. Generally speaking, this is sufficient for the data that needs to be saved, and you can get this value through the -XX:+PrintFlagsFinal
JVM parameter.
I tried running the same test in the original release of Java 8, which still supports the -XX:StringTableSize
parameter to be compatible with Java 7 features. The main difference is that the default pool size in Java 8 is increased to 60013:
50000; time = 0.019 sec 100000; time = 0.009 sec 150000; time = 0.009 sec 200000; time = 0.009 sec 250000; time = 0.009 sec 300000; time = 0.009 sec 350000; time = 0.011 sec 400000; time = 0.012 sec 450000; time = 0.01 sec 500000; time = 0.013 sec 550000; time = 0.013 sec 600000; time = 0.014 sec 650000; time = 0.018 sec 700000; time = 0.015 sec 750000; time = 0.029 sec 800000; time = 0.018 sec 850000; time = 0.02 sec 900000; time = 0.017 sec 950000; time = 0.018 sec 1000000; time = 0.021 sec
The test code for this article is very simple, creating and retaining new strings in a loop in a method. You can measure how long it takes to retain 10,000 strings. It is best to run this test with the -verbose:gc
JVM parameter to see when and how garbage collection occurs. It is also better to use the -Xmx
parameter to enforce the maximum size of the heap.
这里有两个测试:testStringPoolGarbageCollection
将显示 JVM 字符串池被垃圾收集 — 检查垃圾收集日志消息。在 Java 6 的默认 PermGen 大小配置上,这个测试会失败,因此最好增加这个值,或者更新测试方法,或者使用 Java 7.
第二个测试显示内存中保留了多少字符串。在 Java 6 中执行需要两个不同的内存配置 比如: -Xmx128M
以及 -Xmx1280M
(10 倍以上)。你可能发现这个值不会影响放入池中字符串的数量。另一方面,在 Java 7 中你能够在堆中填满你的字符串。
/** - Testing String.intern. * - Run this class at least with -verbose:gc JVM parameter. */ public class InternTest { public static void main( String[] args ) { testStringPoolGarbageCollection(); testLongLoop(); } /** - Use this method to see where interned strings are stored - and how many of them can you fit for the given heap size. */ private static void testLongLoop() { test( 1000 * 1000 * 1000 ); //uncomment the following line to see the hand-written cache performance //testManual( 1000 * 1000 * 1000 ); } /** - Use this method to check that not used interned strings are garbage collected. */ private static void testStringPoolGarbageCollection() { //first method call - use it as a reference test( 1000 * 1000 ); //we are going to clean the cache here. System.gc(); //check the memory consumption and how long does it take to intern strings //in the second method call. test( 1000 * 1000 ); } private static void test( final int cnt ) { final List<String> lst = new ArrayList<String>( 100 ); long start = System.currentTimeMillis(); for ( int i = 0; i < cnt; ++i ) { final String str = "Very long test string, which tells you about something " + "very-very important, definitely deserving to be interned #" + i; //uncomment the following line to test dependency from string length // final String str = Integer.toString( i ); lst.add( str.intern() ); if ( i % 10000 == 0 ) { System.out.println( i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" ); start = System.currentTimeMillis(); } } System.out.println( "Total length = " + lst.size() ); } private static final WeakHashMap<String, WeakReference<String>> s_manualCache = new WeakHashMap<String, WeakReference<String>>( 100000 ); private static String manualIntern( final String str ) { final WeakReference<String> cached = s_manualCache.get( str ); if ( cached != null ) { final String value = cached.get(); if ( value != null ) return value; } s_manualCache.put( str, new WeakReference<String>( str ) ); return str; } private static void testManual( final int cnt ) { final List<String> lst = new ArrayList<String>( 100 ); long start = System.currentTimeMillis(); for ( int i = 0; i < cnt; ++i ) { final String str = "Very long test string, which tells you about something " + "very-very important, definitely deserving to be interned #" + i; lst.add( manualIntern( str ) ); if ( i % 10000 == 0 ) { System.out.println( i + "; manual time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" ); start = System.currentTimeMillis(); } } System.out.println( "Total length = " + lst.size() ); } }
由于 Java 6 中使用固定的内存大小(PermGen)因此不要使用 String.intern()
方法。
Java7 和 8 在堆内存中实现字符串池。这以为这字符串池的内存限制等于应用程序的内存限制。
在 Java 7 和 8 中使用 -XX:StringTableSize
来设置字符串池 Map 的大小。它是固定的,因为它使用 HashMap
实现。近似于你应用单独的字符串个数(你希望保留的)并且设置池的大小为最接近的质数并乘以 2 (减少碰撞的可能性)。它是的 String.intern
可以使用相同(固定)的时间并且在每次插入时消耗更小的内存(同样的任务,使用java WeakHashMap将消耗4-5倍的内存)。
在 Java 6 和 7(Java7u40以前) 中 -XX:StringTableSize
参数的值是 1009。Java7u40 以后这个值调整为 60013 (Java 8 中使用相同的值)。
如果你不确定字符串池的用量,参考:-XX:+PrintStringTableStatistics
JVM 参数,当你的应用挂掉时它告诉你字符串池的使用量信息。
The above is the detailed content of Sharing various techniques to improve Java code performance. For more information, please follow other related articles on the PHP Chinese website!