Detailed explanation of optimizing garbage collection for mission-critical Java applications (Part 2)-javaTutorial-php.cn

Optimizing garbage collection for mission-critical Java applications (Part 1)

Parallel mark sweep (CMS) collector

The CMS garbage collector was the first widely used low-latency collector . Although it is available in Java 1.4.2, it is not very stable at the beginning. These problems were not solved until Java 5.

It can be seen from the name of the CMS collector that it uses a parallel method: most of the recycling work is completed by a GC thread, which is executed in parallel with the worker thread that handles user requests. The original single stop-the-world recycling process in the old generation is divided into two shorter stop-the-world pauses plus 5 parallel stages. During these parallel phases, the original worker thread runs as usual (without being paused).

Use the following parameters to activate the CMS recycler:

-XX:+UseConcMarkSweepGC

Copy after login

Applying the above test program again (and increasing the load) gives the following results:

Figure 4 GC behavior of JVM with optimized heap size and using CMS in 50 hours (-Xms1200m -Xmx1200m -XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC ))

It can be seen that the 8s pause of the old generation GC has disappeared. Now, there are only two pauses in the old generation collection process (the previous one resulted in 5 pauses in 50 hours), and all pauses are within 1 second.

By default, the CMS collector uses ParNew (GC algorithm) to handle new generation recycling. If ParNew is run with a CMS, its pauses will be a little longer than without a CMS because of the extra coordination required between them. Compared with the last test results, this problem can be found from the slight increase in the average pause time of the new generation. Outliers appear frequently in the new generation pause time, and this problem can also be found here. Outliers can reach around 0.5s. But these pauses are short enough for many applications, so the CMS/ParNew combination can serve as a good low-latency optimization option.

A serious flaw of the CMS collector is that the CMS cannot start when the old generation space is full. Once the old generation is full, it is too late to start the CMS; the virtual machine must use the usual "stop-the-world" strategy ("concurrent mode failure" records will appear in the GC log) . In order to achieve the low latency goal, when the old generation space occupancy reaches a certain threshold, the CMS collector should be started, which is achieved through the following settings:

-XX:CMSInitiatingOccupancyFraction=80

Copy after login

This means that once the old generation space is 80% occupied , the CMS collector will run. For our application, just use this value (which is the default value). But if the threshold is set too high, "concurrent mode failure" will occur, resulting in long-term old generation GC pauses. On the other hand, if it is set too low (less than the active space size), CMS may always run in parallel, causing a certain CPU core to be fully used for GC. If an application's Object creation and heap usage behavior changes rapidly, such as by launching specialized tasks through interactive methods or timers, it is difficult to set an appropriate threshold value while avoiding the above two problems.

Shadow of Fragment

However, one of the biggest problems with CMS is that it does not clean up the old generation heap space. This creates heap fragmentation, which over time can lead to severe service degradation. Two factors can cause this: tight old generation space, and frequent CMS recycling. The first factor can be improved by increasing the old generation heap space, larger than the space required by the ParallelGC collector (I increased it from 1024M to 1200M, as you can see from the first few pictures). The second problem can be optimized by appropriately dividing the space of each generation, as mentioned earlier. We can actually see how much this can reduce the frequency of old generation GC.

In order to prove that it is important to reasonably adjust the heap size of each generation before using CMS, let's first look at how to use the CMS collector directly on the basis of Figure 1 (almost no heap optimization) if the above principles are not followed. What will happen:

Figure 5 GC behavior without optimized heap size, and performance deterioration caused by memory fragmentation after using CMS (starting at hour 14)

Obviously, the JVM can work stably for nearly 14 hours under the load test with this setting (in production environments and under smaller load conditions, this benign phase of instability may last longer). Next, suddenly there will be multiple long GC pauses that take up almost half of the remaining time. Not only will the pause time of the old generation reach more than 10 seconds, but the pause time of the new generation will also reach several seconds. Because the collector needs to spend a long time searching the old generation space in order to move the objects from the new generation to the old generation.

CMS低延迟优点的代价就是内存碎片。这个问题可以最小化，但是不会彻底消失。你永远不知道它什么时候会被触发。然而，通过合理的优化与监控可以控制它的风险。

G1（Garbage First）回收器的希望

G1回收器设计的目的就是保证低延迟的同时而没有堆碎片风险。因此，Oracle把它作为CMS的一个长期取代。G1可以避免碎片风险是因为它会整理堆空间。对于GC暂停来说，G1的目标并不是使暂停时间最小化，而是设置一个时间上限，使GC暂停尽量满足这一上限值。

在将G1回收器用于测试程序中并与上述其他经典回收器做对比之前，先总结两点关于G1的重要信息。

Oracle在Java 7u4中开始支持G1。为了使用G1你应该将Java 7更新到最新。Oracle的GC团队一直致力于G1的研发，在最新的Java更新中（本文编写时最新版本是7u7到7u9），G1的改进很显著。另一方面，G1无法在任何Java 6版本中使用，而且到目前更优越的Java 7不可能向后移植到Java 6中。
前面关于调节各代空间大小的优化对G1来说已经淘汰了。设置各代空间大小与设置暂停目标时间相冲突会使G1回收器偏离原本的设计目标。使用G1时，可以使用“-Xms”和“-Xmx”设置整体的内存大小，也可以设置GC暂停目标时间（可选），对G1来说不用设置其他选项。与ParallelGC回收器的AdapativeSizingPolicy类似，它自适应地调整各代空间大小来满足暂停目标时间。

遵循这些原则后，G1回收器在默认配置下的结果如下：

图6 最小配置(-Xms1024m -Xmx1024 -XX:+UseG1GC)的JVM在G1下26小时内的GC性能

在这个例子中，我们使用了默认的GC暂停目标时间200ms。从图中可以看到，平均时间与这个目标比较吻合，最长GC暂停时间与使用CMS回收器差不多（图4）。G1明显可以很好地控制GC暂停，与平均时长相比，离群值也相当少。

另一方面，平均GC暂停时间要比CMS回收器长很多（270 vs 100ms），而且更频繁。这意味着GC累积暂停时间（也就是GC本身所占总时间）是使用CMS的4倍以上（6.96% vs 1.66%）。

与CMS一样，G1也分为GC暂停阶段和并行回收阶段（不暂停任务）。同样与CMS类似，当堆占用比达到一定门限后，它才启动并行回收阶段。从图6可以看到，1GB的可用内存到目前为止并没有完全使用。这是因为G1的默认占用比门限值要比CMS低很多。也有人指出，一般来说较小的堆空间就可以满足G1的需求。

垃圾回收器的定量比较

下面的表格总结了Oracle Java 7中4种最重要的垃圾回收器在测试中的关键性能指标。在同样的应用程序上，进行相同的负载测试，但是负载的级别不同（由第2列的垃圾创建速率体现）。

表几种垃圾回收器的比较

所有的回收器都运行在1GB的堆空间上。传统的回收器（ParallelGC、ParNewGC和CMS）另外使用下面的堆设置：

-XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6

Copy after login

而G1回收器没有额外的堆大小设置，并且使用默认的暂停目标时间200ms，也可以显示设置：

-XX:MaxGCPauseMillis=200

Copy after login

从表中可以看到，传统回收器在新生代回收上（第3列）时间差不多。对ParallelGC和ParNewGC来说是差不多的，而CMS实际上也是使用ParNewGC去回收新生代。然而，在新生代GC暂停中，将新生代存活对象移入老年代需要ParNewGC和CMS的协同。这样的协同引入额外的代价，也就导致CMS的新生代GC暂停时间要略长。

第7列是GC暂停所耗费的时间占总时间的百分比，这个值可以很好地反映GC的总时间代价。因为并行GC总时间（最后一列）以及引入的CPU占用代价可以忽略。按前文所述，优化堆大小后老年代GC次数会变得很少，这样第7列的值主要由新生代GC暂停总时间所决定。新生代暂停总时间是新生代暂停（连续）时长（第3列）与暂停次数的乘积。新生代暂停频率与新生代空间大小有关，对传统回收器来说，这个大小是相同的（400MB）。因此，对传统回收器来说，第7列的值或多或少地反映着第3列的值（负载差不多的情况）。

The advantages of CMS can be clearly seen in column 6: it trades a slightly longer total time cost for shorter (one order of magnitude lower) old generation GC pauses. For many real-world applications, this is a good compromise.

So, how does the G1 collector perform for our application? As can be seen in column 6 (and column 5), the G1 collector does a better job than the CMS collector in reducing the old generation GC pause time. But as you can also see from column 7, it pays a very high price: under the same load, the total time cost of GC accounts for 7%, while CMS only accounts for 1.6%.

In subsequent articles, I will examine the conditions that cause G1 to incur a higher GC time cost, and also analyze the advantages and disadvantages of G1 compared with other collectors (especially the CMS collector) . This is a large and valuable topic.

Summary and Outlook

For all classic Java GC algorithms (SerialGC, ParallelGC, ParNewGC and CMS), it is very important to optimize the heap space size of each generation. However, in many practical applications The program did not do enough reasonable optimization. The result is insufficiently optimized application performance and operational degradation (causing performance loss and even program suspension for a period of time if not well monitored).

Optimizing the heap space size of each generation can significantly improve application performance and minimize the number of long GC pauses. Then, eliminating long GC pauses requires using a low-latency collector. CMS has been (until now) the preferred and efficient low-latency collector. In many cases, a CMS will suffice. With reasonable optimization, it can still ensure long-term stability, but there is a risk of heap fragmentation.

As an alternative, the G1 collector is currently (Java 7u9) a supported and available option, but there is still room for improvement. Its results are acceptable for many applications, but don't quite compare well to the CMS collector. The details of its advantages and disadvantages deserve careful study

The above is the detailed content of Detailed explanation of optimizing garbage collection for mission-critical Java applications (Part 2). For more information, please follow other related articles on the PHP Chinese website!