Sharing various techniques to improve Java code performance-javaTutorial-php.cn

[Introduction] String intern in Java 6,7,8; String Pool This article will discuss how the String intern method is implemented in Java 6, and how this method works in Java 7 and What adjustments have been made in Java 8. String Pool String Pool (Named String Normalization)

This parameter doesn’t help much in Java 6, because you’re still limited to the fixed PermGen memory size. The subsequent discussion will directly ignore Java 6

You must set a larger -XX:StringTalbeSize value (compared to the default 1009), if you want to Use String.intern() more often - otherwise this method will quickly reduce to 0 (pool size).

String.intern in Java 6,7,8 – String Pool

This article will discuss how to implement String.intern in Java 6 method, and what adjustments have been made to this method in Java 7 and Java 8.

String pooling

String pooling (named string normalization) is done by using a unique shared String object to use the same value differently The address represents the string process. You can use your own <a href="http://www.php.cn/code/8210.html" target="_blank">Map</a><String, String> (use weak reference or soft reference as needed) and use the value in the map as the standard value To achieve this goal, or you can use String.intern() provided by the JDK.

Many standards prohibit the use of String.intern() because if the pool is used frequently, it will be controlled by the city and there is a high probability of triggering OutOfMemory<a href="http://www.php.cn/wiki/265.html" target="_blank">Exception</a>. Oracle Java 7 has made many improvements to the string pool. You can learn about it at the following addresses bugs.sun.com/view_bug.do?bug_id=6962931 and bugs.sun.com/view_bug.do?bug_id=6962930

String.intern() in Java 6

In the good old days all shared String objects were stored in PermGen - a fixed-size portion of the heap mainly used to store loaded class objects and string pools . In addition to explicitly shared strings, the PermGen string pool also contains all strings used in the program (note here that it is a used string, if the class or method is never loaded or used, any # defined in it ##Constant will not be loaded)

The biggest problem with the string pool in Java 6 is its location - PermGen. The size of PermGen is fixed and cannot be expanded at runtime. You can resize it using the

-XX:MaxPermSize=N configuration. As far as I know, the default PermGen size ranges from 32M to 96M for different platforms. You can expand its size, but the size usage is fixed. This limitation requires you to be very careful when using String.intern — you'd better not use this method to intern any uncontrollable user input. This is why manual management Map is mostly used in JAVA6 to implement string pool

String.intern() in Java 7

Oracle engineers in Java 7 A big change has been made to the logic of the string pool - the location of the string pool has been adjusted to the heap. This means you are no longer limited by a fixed memory space. All strings are stored in the heap like other ordinary objects, which allows you to only adjust the heap size when tuning your application. This change gives us reason enough to reconsider using String.intern() in Java 7.

Data in the string pool will be garbage collected

Yes, all strings in the JVM string pool will be garbage collected, if these values do not have any references in the application. This is used in all versions of Java, which means that if the interned string is out of scope and does not have any reference - it will be garbage collected from the JVM's string pool.

Because it is relocated to the heap and will be garbage collected, the JVM's string pool seems to be a suitable place to store strings, right? Theoretically - strings that violate usage will be collected from the pool, which can save memory when a character is input from the outside and exists in the pool. Looks like a perfect memory saving strategy? Before you answer this, it's a safe bet that you need to know how string pooling is implemented.

Implementation of JVM string pool in Java 6, 7, 8

The string pool uses a fixed capacity

HashMap Each element contains the same hash A string list of values. Some implementation details can be obtained from the Java bug report bugs.sun.com/view_bug.do?bug_id=6962930

The default pool size is 1009 (appears in the source code of the bug report mentioned above and was added in Java7u40). This was a constant in early versions of JAVA 6 and was later adjusted to be configurable in java6u30 to java6u41. In java 7, it is configurable from the beginning (at least it is configurable in java7u02). You need to specify the parameter -XX:StringTableSize=N, N is the size of the string pool Map. Make sure it's a pre-sized size for performance tuning.

In Java 6 this parameter is not of much help as you are still limited to the fixed PermGen memory size. The following discussion will directly ignore Java 6

Java 7 (until Java7u40)

In Java7, in other words, you are limited to a larger heap memory middle. This means that you can pre-set the size of the String pool (this value depends on your application needs). Generally speaking, once a program starts to consume memory, the memory grows by hundreds of megabytes. In this case, it seems more appropriate to allocate 8-16M of memory to a string pool with 1 million string objects ( Do not use 1,000,000 as the value of -XX:StringTaleSize – it is not a prime number; use 1,000,003 instead)

You might expect about the allocation of Strings in Maps – you can read My previous experience with HashCode method tuning.

You must set a larger -XX:StringTalbeSize value (compared to the default 1009) if you wish to use String.intern() more — Otherwise this method will quickly reduce it to 0 (pool size).

I didn't notice the dependency when interning strings smaller than 100 characters (I think a string containing 50 repeated characters is not similar to real data, so 100 characters seems to be a good test limit)

Here are the application logs for the default pool size: the first column is the number of strings that have been interned, the second column interns 10,000 strings all the time (Seconds)

0; time = 0.0 sec
50000; time = 0.03 sec
100000; time = 0.073 sec
150000; time = 0.13 sec
200000; time = 0.196 sec
250000; time = 0.279 sec
300000; time = 0.376 sec
350000; time = 0.471 sec
400000; time = 0.574 sec
450000; time = 0.666 sec
500000; time = 0.755 sec
550000; time = 0.854 sec
600000; time = 0.916 sec
650000; time = 1.006 sec
700000; time = 1.095 sec
750000; time = 1.273 sec
800000; time = 1.248 sec
850000; time = 1.446 sec
900000; time = 1.585 sec
950000; time = 1.635 sec
1000000; time = 1.913 sec

Copy after login

Testing was conducted on a Core i5-3317U@1.7Ghz CPU device. You can see that it grows linearly, and with the JVM string pool containing a million strings, I can still intern approximately 5000 strings per second, which is good for an application that handles a lot of data in memory too slow.

Now, adjust the -XX:StringTableSize=100003 parameters to re-run the test:

50000; time = 0.017 sec
100000; time = 0.009 sec
150000; time = 0.01 sec
200000; time = 0.009 sec
250000; time = 0.007 sec
300000; time = 0.008 sec
350000; time = 0.009 sec
400000; time = 0.009 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.011 sec
600000; time = 0.012 sec
650000; time = 0.015 sec
700000; time = 0.015 sec
750000; time = 0.01 sec
800000; time = 0.01 sec
850000; time = 0.011 sec
900000; time = 0.011 sec
950000; time = 0.012 sec
1000000; time = 0.012 sec

Copy after login

As you can see, the time to insert the string is approximately constant (in The average number of strings in the string list of the Map does not exceed 10). Here are the results of the same settings, but this time we will insert 10 million strings into the pool (this means that the average number of strings in the Map's string list contains 100 strings)

2000000; time = 0.024 sec
3000000; time = 0.028 sec
4000000; time = 0.053 sec
5000000; time = 0.051 sec
6000000; time = 0.034 sec
7000000; time = 0.041 sec
8000000; time = 0.089 sec
9000000; time = 0.111 sec
10000000; time = 0.123 sec

Copy after login

Now let’s increase the eat size to 1 million (1,000,003 to be precise)

1000000; time = 0.005 sec
2000000; time = 0.005 sec
3000000; time = 0.005 sec
4000000; time = 0.004 sec
5000000; time = 0.004 sec
6000000; time = 0.009 sec
7000000; time = 0.01 sec
8000000; time = 0.009 sec
9000000; time = 0.009 sec
10000000; time = 0.009 sec

Copy after login

As you can see, the times are very even and consistent with The "0 to 1 million" tables don't make much difference. Even with a large enough pool size, my notebook is adding 1,000,000 character objects per second.

Do we still need to manually manage the string pool?

Now we need to compare the JVM string pool with WeakHashMap<String, WeakReference<String>> It can be used to simulate the JVM string pool. The following method is used to replace String.intern:

private static final WeakHashMap<String, WeakReference<String>> s_manualCache = 
    new WeakHashMap<String, WeakReference<String>>( 100000 );

private static String manualIntern( final String str )
{
    final WeakReference<String> cached = s_manualCache.get( str );
    if ( cached != null )
    {
        final String value = cached.get();
        if ( value != null )
            return value;
    }
    s_manualCache.put( str, new WeakReference<String>( str ) );
    return str;
}

Copy after login

The same test below for a manual pool:

0; manual time = 0.001 sec
50000; manual time = 0.03 sec
100000; manual time = 0.034 sec
150000; manual time = 0.008 sec
200000; manual time = 0.019 sec
250000; manual time = 0.011 sec
300000; manual time = 0.011 sec
350000; manual time = 0.008 sec
400000; manual time = 0.027 sec
450000; manual time = 0.008 sec
500000; manual time = 0.009 sec
550000; manual time = 0.008 sec
600000; manual time = 0.008 sec
650000; manual time = 0.008 sec
700000; manual time = 0.008 sec
750000; manual time = 0.011 sec
800000; manual time = 0.007 sec
850000; manual time = 0.008 sec
900000; manual time = 0.008 sec
950000; manual time = 0.008 sec
1000000; manual time = 0.008 sec

Copy after login

A manually written pool provides for good performance. Unfortunately though, my test (String.valueOf(0 < N < 1,000,000,000)) keeps very short strings, which it allows when using the -Xmx1280M parameter I keep 2.5M of these strings per month. The JVM string pool (size=1,000,003) on the other hand provides the same performance characteristics when the JVM memory is sufficient, until the JVM string pool contains 12.72M strings and consumes all the memory (5 times more). In my opinion, it's well worth removing all manual string pooling from your application.

String.intern() in Java 7u40+ and Java 8

Java7u40 version extends the string pool size (this is a required performance update) to 60013. This value allows you to The pool contains approximately 30,000 unique strings. Generally speaking, this is sufficient for the data that needs to be saved, and you can get this value through the -XX:+PrintFlagsFinal JVM parameter.

I tried running the same test in the original release of Java 8, which still supports the -XX:StringTableSize parameter to be compatible with Java 7 features. The main difference is that the default pool size in Java 8 is increased to 60013:

50000; time = 0.019 sec
100000; time = 0.009 sec
150000; time = 0.009 sec
200000; time = 0.009 sec
250000; time = 0.009 sec
300000; time = 0.009 sec
350000; time = 0.011 sec
400000; time = 0.012 sec
450000; time = 0.01 sec
500000; time = 0.013 sec
550000; time = 0.013 sec
600000; time = 0.014 sec
650000; time = 0.018 sec
700000; time = 0.015 sec
750000; time = 0.029 sec
800000; time = 0.018 sec
850000; time = 0.02 sec
900000; time = 0.017 sec
950000; time = 0.018 sec
1000000; time = 0.021 sec

Copy after login

Test code

The test code for this article is very simple, creating and retaining new strings in a loop in a method. You can measure how long it takes to retain 10,000 strings. It is best to run this test with the -verbose:gc JVM parameter to see when and how garbage collection occurs. It is also better to use the -Xmx parameter to enforce the maximum size of the heap.

这里有两个测试：testStringPoolGarbageCollection 将显示 JVM 字符串池被垃圾收集 — 检查垃圾收集日志消息。在 Java 6 的默认 PermGen 大小配置上，这个测试会失败，因此最好增加这个值，或者更新测试方法，或者使用 Java 7.

第二个测试显示内存中保留了多少字符串。在 Java 6 中执行需要两个不同的内存配置比如： -Xmx128M 以及 -Xmx1280M （10 倍以上）。你可能发现这个值不会影响放入池中字符串的数量。另一方面，在 Java 7 中你能够在堆中填满你的字符串。

/**
 - Testing String.intern.
 *
 - Run this class at least with -verbose:gc JVM parameter.
 */
public class InternTest {
    public static void main( String[] args ) {
        testStringPoolGarbageCollection();
        testLongLoop();
    }

    /**
     - Use this method to see where interned strings are stored
     - and how many of them can you fit for the given heap size.
     */
    private static void testLongLoop()
    {
        test( 1000 * 1000 * 1000 );
        //uncomment the following line to see the hand-written cache performance
        //testManual( 1000 * 1000 * 1000 );
    }

    /**
     - Use this method to check that not used interned strings are garbage collected.
     */
    private static void testStringPoolGarbageCollection()
    {
        //first method call - use it as a reference
        test( 1000 * 1000 );
        //we are going to clean the cache here.
        System.gc();
        //check the memory consumption and how long does it take to intern strings
        //in the second method call.
        test( 1000 * 1000 );
    }

    private static void test( final int cnt )
    {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = "Very long test string, which tells you about something " +
            "very-very important, definitely deserving to be interned #" + i;
//uncomment the following line to test dependency from string length
//            final String str = Integer.toString( i );
            lst.add( str.intern() );
            if ( i % 10000 == 0 )
            {
                System.out.println( i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }

    private static final WeakHashMap<String, WeakReference<String>> s_manualCache =
        new WeakHashMap<String, WeakReference<String>>( 100000 );

    private static String manualIntern( final String str )
    {
        final WeakReference<String> cached = s_manualCache.get( str );
        if ( cached != null )
        {
            final String value = cached.get();
            if ( value != null )
                return value;
        }
        s_manualCache.put( str, new WeakReference<String>( str ) );
        return str;
    }

    private static void testManual( final int cnt )
    {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = "Very long test string, which tells you about something " +
                "very-very important, definitely deserving to be interned #" + i;
            lst.add( manualIntern( str ) );
            if ( i % 10000 == 0 )
            {
                System.out.println( i + "; manual time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }
}

Copy after login

总结

由于 Java 6 中使用固定的内存大小（PermGen）因此不要使用 String.intern() 方法。
Java7 和 8 在堆内存中实现字符串池。这以为这字符串池的内存限制等于应用程序的内存限制。
在 Java 7 和 8 中使用 -XX:StringTableSize 来设置字符串池 Map 的大小。它是固定的，因为它使用 HashMap 实现。近似于你应用单独的字符串个数（你希望保留的）并且设置池的大小为最接近的质数并乘以 2 （减少碰撞的可能性）。它是的 String.intern 可以使用相同（固定）的时间并且在每次插入时消耗更小的内存(同样的任务，使用java WeakHashMap将消耗4-5倍的内存)。
在 Java 6 和 7（Java7u40以前）中 -XX:StringTableSize 参数的值是 1009。Java7u40 以后这个值调整为 60013 （Java 8 中使用相同的值）。
如果你不确定字符串池的用量，参考：-XX:+PrintStringTableStatistics JVM 参数，当你的应用挂掉时它告诉你字符串池的使用量信息。

The above is the detailed content of Sharing various techniques to improve Java code performance. For more information, please follow other related articles on the PHP Chinese website!