Java’s performance is known as some kind of black magic. Part of the reason is that the Java platform is very complex and in many cases the problem is difficult to locate. However, there is also a historical trend for people to study Java performance based on wisdom and experience, rather than relying on applied statistics and empirical reasoning. In this article, I hope to debunk some of the most ridiculous tech myths.
1. Java is slow
There are many fallacies about Java’s performance, and this one is the most outdated and probably the most obvious.
It’s true that in the 1990s and early 2000s, Java was sometimes slow.
However, virtual machines and JIT technology have improved for more than ten years since then, and the overall performance of Java is now very good.
In 6 independent web performance benchmarks, Java framework ranked in the top four in 22 out of 24 tests.
Although the JVM uses performance profiling to only optimize commonly used code paths, the optimization effect is obvious. In many cases, JIT-compiled Java code is as fast as C++, and this is happening more and more.
Despite this, some people still think that the Java platform is slow. This may come from the historical prejudice of people who have experienced early versions of the Java platform.
Before drawing conclusions, we recommend staying objective and evaluating the latest performance results.
2. A single line of Java code can be viewed in isolation
Consider the following short line of code:
MyObject obj = new MyObject();
For Java developers, it seems obvious that this line of code will definitely Allocate an object and call the appropriate constructor.
We may be able to derive performance boundaries based on this. We assume that this line of code will definitely cause a certain amount of work to be performed, and based on this presumption, we can try to calculate its performance impact.
In fact, this understanding is wrong. It makes us preconception that no matter what kind of work, it will be carried out under any circumstances.
In fact, both javac and JIT compilers can optimize out dead code. As far as the JIT compiler is concerned, based on profiling data, the code can even be optimized out through prediction. In this case, this line of code will not run at all, so there will be no impact on performance.
Additionally, in some JVMs - such as JRockit - the JIT compiler can even decompose operations on objects, so that allocation operations can be avoided even if the code path is still valid.
The moral here is that context is very important when dealing with Java performance issues, and premature optimization has the potential to produce counterintuitive results. So it’s best not to optimize too early. Instead, always build your code and use performance tuning techniques to locate performance hot spots and then improve them.
3. Microbenchmarking is just as you imagine
As we saw above, examining a small piece of code is not as accurate as analyzing the overall performance of the application.
Despite this, developers love writing microbenchmarks. It seems like there's a lot of fun tinkering with some aspect of the underlying platform.
Richard Feynman once said: "Don't deceive yourself, you are the most easily deceived person." This sentence is perfect to illustrate the writing of Java micro-benchmarks.
Writing good microbenchmarks is extremely difficult. The Java platform is very complex, and many microbenchmarks can only measure transient effects or other unexpected aspects of the Java platform.
For example, if you have no experience, the micro-benchmarks you write often just measure time or garbage collection, but do not capture the real influencing factors.
Only developers and development teams with actual needs should write microbenchmarks. These benchmarks should be fully public (including source code), reproducible, and subject to peer review and further scrutiny.
Many optimizations of the Java platform show that statistical runs and single runs have a large impact on the results. To get a true and reliable answer, you should run a single benchmark multiple times and then aggregate the results.
If readers feel the need to write micro-benchmarks, the paper "Statistically Rigorous Java Performance Evaluation" by Georges, Buytaert, and Eeckhout is a good start. Without proper statistical analysis, we can easily be misled.
There are many well-developed tools and communities around them (such as Google’s Caliper). If you really need to write a microbenchmark, don't write it yourself. What you need is the opinions and experience of your peers.
4. Slow algorithms are the most common cause of performance problems
There is a very common cognitive error among developers (and the general public as well), that is, thinking that the part of the system they control is important.
This cognitive error is also reflected when discussing Java performance: Java developers believe that the quality of the algorithm is the main cause of performance problems. Developers think about code, so they naturally tend to think about their own algorithms.
In fact, when dealing with a series of real-life performance problems, the chance that people find that algorithm design is the fundamental problem is less than 10%.
Conversely, garbage collection, database access, and configuration errors are more likely to cause application slowness than algorithms.
Most applications process relatively small amounts of data, so even if the main algorithm is not efficient, it usually does not cause serious performance problems. To be sure, our algorithm is not optimal; nevertheless, the performance problems caused by the algorithm are still relatively small, and more performance problems are caused by other parts of the application stack.
So our best advice is to use actual production data to uncover the true cause of performance issues. Measure performance data, don't guess!
5. Caching can solve all problems
“All problems in computer science can be solved by introducing an intermediate layer.”
David Wheeler’s programmer motto (on the Internet, this sentence is still popular at least Attributed to two other computer scientists) is very common, especially among web developers.
If the existing architecture is not thoroughly understood and the analysis has stalled, it is often the fallacy that "caching can solve all problems" rears its ugly head.
In the developers’ view, instead of dealing with the scary existing system, it is better to add a layer of cache in front to hide the existing system and hope for the best. Undoubtedly, this approach only makes the overall architecture more complex, and the situation will be even worse when the next developer who takes over tries to understand the current status of the system.
Large-scale and poorly designed systems often lack overall design and are written one line of code and one subsystem at a time. In many cases, however, simplifying and refactoring the architecture results in better performance and is almost always easier to understand.
So when evaluating whether it is really necessary to add caching, you should first plan to collect some basic usage statistics (such as hit rate and miss rate, etc.) to prove the real value brought by the caching layer.
6. All applications need to pay attention to the Stop-The-World issue
There is an unchangeable fact on the Java platform: in order to run garbage collection, all application threads must be paused periodically. This is sometimes cited as a serious shortcoming of Java, even without any real evidence.
Empirical research shows that if digital data (such as price fluctuations) changes more frequently than once every 200 milliseconds, people will not be able to perceive it normally.
Apps are primarily for human use, so we have a useful rule of thumb that a Stop-The-World (STW) of 200ms or less usually has no impact. Some applications may have higher requirements (such as streaming media), but many GUI applications do not need it.
A few applications (such as low-latency trading or machine control systems) cannot tolerate 200 millisecond pauses. Unless you are writing this type of application, users will rarely feel the impact of the garbage collector.
It is worth mentioning that in any system where the number of application threads exceeds the number of physical cores, the operating system must control time-sharing access to the CPU. Stop-The-World sounds scary, but in fact any application (whether JVM or other application) has to face the problem of contention for scarce computing resources.
If it is not measured, it is unclear what additional impact the JVM has on application performance.
Anyway, please turn on the GC logs to determine if the pause time is really affecting the application. Determine the pause time by analyzing the logs, either manually or using scripts or tools. Then determine whether they actually cause problems for the application. Most importantly, ask yourself a critical question: Are users actually complaining?
7. The handwritten object pool is suitable for a large class of applications
Thinking that Stop-The-World pauses are bad to some extent, a common reaction of application development teams is to implement their own memory management technology within the Java heap. This often boils down to implementing an object pool (or even full reference counting) and requiring any code that uses domain objects to be involved.
This technique is almost always misleading. It is based on the understanding of the past, when object allocation was very expensive and modifying objects was much cheaper. Things are completely different now.
Today’s hardware is very efficient at allocation; with the latest desktop or server hardware, memory bandwidth is at least 2 to 3GB. This is a large number, and it is not easy to make full use of such a large bandwidth unless the application is specially written.
In general, it is very difficult to implement object pooling correctly (especially when you have multiple threads working), and object pooling also comes with some negative requirements that make this technique not a general good choice:
All Contacts Developers who come to the object pool code must understand the object pool and be able to handle it correctly
Which code knows the object pool and which code does not know the object pool? The boundaries must be known to everyone and written in the documentation
These additional complexities Keep it updated and review it regularly
If one of them is not satisfied, the risk of problems quietly occurring (similar to pointer reuse in C) will come back
In short, only GC pauses are unacceptable, and adjustments and reconstructions are required Object pooling can only be used when stalls cannot be reduced to an acceptable level.
8. In garbage collection, CMS is always a better choice than Parallel Old
Oracle JDK uses a parallel Stop-The-World collector by default to collect the old generation, that is, Parallel Old collector.
Concurrent-Mark-Sweep (CMS) is an alternative that allows application threads to continue running during most garbage collection cycles, but it comes at a cost and has some caveats.
Allowing the application thread to run together with the garbage collection thread inevitably brings about a problem: the application thread modifies the object graph, which may affect the viability of the object. This situation has to be cleaned up after the fact, so the CMS actually has two STW phases (usually very short).
This will have some consequences:
All application threads must be brought to a safe point, which will be paused twice during each Full GC;
Although garbage collection is performed at the same time as the application, the throughput of the application will be reduced (usually is 50%);
When using CMS for garbage collection, the bookkeeping information (and CPU cycles) used by the JVM is much higher than other parallel collectors.
Whether these costs are worth the money depends on the application. But there is no free lunch in the world. The CMS collector is commendable in its design, but it is not a panacea.
So before confirming that CMS is the correct garbage collection strategy, you should first confirm that Parallel Old’s STW pause is indeed unacceptable and cannot be adjusted. Finally, I emphasize that all metrics must be obtained from a system equivalent to the production system.
9. Increasing the heap size can solve memory problems
When an application is in trouble and a GC problem is suspected, the reaction of many application teams is to increase the heap size. In some cases, this provides quick results and gives us time to consider more thoughtful solutions. However, if the cause of the performance problem is not fully understood, this strategy can make things worse.
Consider a very poorly coded application that is producing a lot of domain objects (their lifetime is representative, say 2-3 seconds). If the allocation rate is high enough, garbage collection will occur frequently, so domain objects will be promoted to the old generation. Almost as soon as domain objects enter the old generation, their survival time ends and they die directly, but they will not be recycled until the next Full GC.
If we increase the heap size of our application, all we do is increase the space used by relatively short-lived objects to enter and die. This will cause the Stop-The-World pause time to be longer, which is not beneficial to the application.
Before modifying the heap size or adjusting other parameters, it is necessary to understand the dynamics of object allocation and lifetime. Acting blindly without measuring performance data will only make the situation worse. The old generation distribution of the garbage collector is particularly important here.
Conclusion
When it comes to performance tuning in Java, intuition is often misleading. We need experimental data and tools to help us visualize the behavior of the platform and enhance our understanding.
Garbage collection is the best example. For tuning or generating data to guide tuning, the GC subsystem has unlimited potential; but for production applications, it is difficult to understand the meaning of the generated data without using tools.
By default, when running any Java process (including development environment and production environment), you should always use at least the following parameters:
-verbose:gc (print GC log)
-Xloggc: (more comprehensive GC log)
-XX:+PrintGCDetails (more detailed output)
-XX:+PrintTenuringDistribution (displays the age threshold used by the JVM to promote objects into the old generation)
Then use tools to analyze the logs, here you can use handwritten scripts, you can To generate graphs, you can also use visualization tools such as GCViewer (open source) or jClarity Censum.