As a programmer, we write Code every day, but do you really understand its life cycle? Today, let’s briefly talk about its life history. Speaking of a piece of Java Code, from birth to game over, it can be roughly divided into the following steps: compilation, class loading, running, and GC.
The compilation period of Java language is actually an "uncertain" process, because it may be a front-end compiler The process of converting .java files into .class files; it may also refer to the process of converting bytecode into machine code by the JVM's back-end runtime compiler (JIT compiler); it may also refer to the use of static advance The compiler (AOT compiler) directly compiles the .java file into local machine code. But here we are talking about the first category. It is also in line with our public understanding of compilation. What processes did the compilation go through during this period?
Lexical analysis is the process of converting the character stream of the source code into a Token set, while syntactic analysis is the process of abstractly constructing a syntax tree (ATS) based on the Token sequence. ATS is A tree representation used to describe the grammatical structure of program code. Each node of the syntax tree represents a grammatical structure in the program code, such as packages, types, modifiers, operators,Interface, return value and even codeComments can be a syntax structure.
After completing the syntax and lexical analysis, the next step is the process of filling the symbol table. The information registered in the symbol table will be used at different stages of compilation. Let’s extend the concept of symbol table here. What is a symbol table? It is a table composed of a set of symbol addresses and symbol information. The simplest can be understood as the form of K-V value pairs of a hash table. Why are symbol tables used? One of the earliest applications of symbol tables was to organize information about program code. Initially, computer programs were just simple numbers, but programmers soon discovered that it was much more convenient to use symbols to represent operations and memory addresses. Associating names and numbers requires a symbol table. As programs grow, the performance of symbol table operations gradually becomes a bottleneck for program development efficiency. For this reason, many data structures and algorithms have been born to improve the efficiency of sequence number tables. As for the so-called data structures and algorithms, what are they? Generally speaking: sequential search in unordered linked lists, binary search in ordered arrays, binary search trees, balanced search trees (here we mainly come into contact with red-black trees), hash tables (hash based on zipper method) lists, hash tables based on linear probing). Like java.util.TreeMap and java.util.HashMap in Java, they are implemented based on the symbol tables of red-black trees and zipper hash tables respectively. The concept of the symbol table mentioned here will not be explained in detail. Those who are interested can find relevant information. Semantic Analysis
. The JVM does not support these syntaxes at runtime and they return to simple basics during the compilation phase. Grammatical structure, this process is to solve the syntax sugar. To give an example of generic erasure, List Bytecode generation is the last stage of the Javac compilation process. At this stage, the information generated in the previous steps will be converted into bytecode and written to the disk. It will also be A small amount of code addition and conversion work was done. Instance constructor Compilation After compiling the program into bytecode, the next step is the process of loading classes into memory. The class loading process is carried out in the method area of the virtual machine memory, which involves the virtual machine memory, so here we first briefly introduce the concept of program distribution in the memory area. The virtual machine memory area is divided into: program counter, stack, local method stack, heap, method area (some areas are runtime constant pools), and direct memory. The program counter is a small memory space. It can be regarded as a line number indicator of the bytecode executed by the current thread. In the JVM concept model, the bytecode interpreter works by changing the value of this counter to select the next bytecode instruction that needs to be executed. The stack is used to store local variable tables, operand stacks, dynamic links, method exits and other information. The local variable table stores various basic data types and objectsreferences that are suppressed during compilation. Like the program counter, it is thread-private. The local method stack is similar to the virtual machine stack introduced above. Their difference is that the virtual machine stack serves the virtual machine to execute Java methods (bytecode), and The local method stack serves the Native methods used by the virtual machine, and some virtual machines even combine the two into one. The heap is the largest piece of memory managed by the JVM. It is an area shared by all threads. Its only purpose is to store object instances. Almost all object instances allocate memory here (like special class objects, memory is allocated in the method area). This place is also the main area for garbage collection management. From the perspective of memory recycling, garbage collectors now use generational collection algorithms (will be introduced in detail later), so the Java heap can be further subdivided: the new generation and the old generation, and the new generation The generation is further subdivided into: Eden space, From Survivor space, and To Survivor space. For efficiency reasons, the heap may also be divided into multiple thread-private allocation buffers (TLAB). No matter how it is divided, it has nothing to do with the storage content. No matter which area, object instances are still stored. The purpose of their existence is only to better recycle and allocate memory. The method area, like the heap, is a memory area shared by threads. It is used to store class information, constants, static variables, and just-in-time compiler compilation that have been loaded by the virtual machine. The code and other data after. The runtime constant pool is part of the method area. It is mainly used to store various literals and symbol references declared at compile time. Direct memory is not part of the virtual machine runtime data area. It is also a memory area not defined in the Java specification. You can simply understand it as off-heap memory. Memory allocation is not affected by Java heap size is limited but is limited by the entire memory size. After talking about the concept of virtual machine memory area, let’s get back to the topic. What is the class loading process? Five steps: loading, verification, preparation, parsing, and initialization. Loading, verification, preparation, and initialization are executed sequentially, but parsing is not necessarily the case. It may be executed after initialization. During the loading phase, the JVM needs to complete three steps: first, obtain the binary byte stream that defines this class through the fully qualified name of the class, and then convert the byte stream represented by this The static storage structure is converted into the runtime data structure of the method area, and finally a java.lang.Class object representing this class is generated in the memory, which serves as various data entries for this class in the method area. In the first step of obtaining the binary byte stream, it is not clearly stated that it should be obtained from a *.class file. The flexibility of the regulations allows us to obtain it from the ZIP (providing the basis for JAR, EAR/WAR formats) package, and obtain it from the network. (Applet), calculated and generated at runtime (dynamic proxy), other files generated (Class class generated by JSP file), obtained from the database. Verification, as the name suggests, is actually to ensure that the information contained in the Class file byte stream meets the requirements of the JVM, because the source of the Class file is not necessarily generated from the compiler, and may also be generated using HexadecimalEditorWrite Class files directly. The verification process includes file format verification, metadata verification, and bytecode verification. The specific security verification methods here will not be detailed here. The preparation stage is the stage where memory is formally allocated for class variables and initial values are set. The memory used by these variables is allocated in the method area. The parsing phase is the process in which the JVM replaces the symbol reference in the constant pool with a direct reference (a pointer to the target, a relative offset or a handle). The compilation filling we talked about earlier The value of the symbol table is reflected here. The parsing process is nothing more than parsing classes or interfaces, fields, and interface methods. The class initialization phase is the last step in the class loading process. In the preparation phase, the variables have been assigned an initial value, and in this step, it will be carried out according to the requirements customized by the programmer. Initialize class variables and other resources. At this stage, it is the process of executing the When encountering the four bytecode instructions of new, getstatic, putstatic or invokestatic, if the class has not been initialized, its initialization needs to be triggered. What are the various fork instructions in front of it? A simple understanding is when you create a new object, when you read or set a static field of a class, or when you call a static method of a class. When using the method of the java.lang.reflect package to make a reflective call to a class, if the class is not initialized, its initialization needs to be triggered. When the virtual machine starts, the user needs to specify a main class to be executed (the class where the main method is located), and the virtual machine first initializes the main class. When using dynamic language support above JDK1.7, if the final parsing result of a java.lang.invoke.MethodHandle instance is the method handle of REF_getStatic, REF_putStatic, REF_invokeStatic, and this If the class corresponding to the method handle has not been initialized, the initialization operation will be triggered. After the above two stages, the program begins to run normally. We all know that the program execution process involves the calculation operations of various instructions. How does the program What about execution? This is where the back-end compiler (JIT just-in-time compiler) + interpreter mentioned at the beginning of the article will be used (the HotSpot virtual machine uses an interpreter and a compiler by default), and bytecode execution The engine is responsible for the tasks of various program calculation operations. When executing Java code, it may have two options: interpreted execution (executed through an interpreter) and compiled execution (local code generated through a just-in-time compiler). Or maybe both. Stack frame is a data structure used to support method calling and execution of virtual machines. The specific calculation ideas for stack pushing and popping various instructions involve a classic algorithm-Dijkstra algorithm. As for how to execute it, if you are interested, check the information yourself. This place doesn't go too deep. Runtime optimization issues are equally important at this stage, and the JVM design team has concentrated performance optimization at this stage, so that Class files not generated by Javac can also enjoy the benefits of compiler optimization. As for the specifics What are the optimization techniques? There are many, here are a few representative optimization techniques: common subexpressionelimination, array bounds check elimination, method inlining, escape analysis, etc. Finally, it is said that the program is entering the death stage. How does the JVM determine program pills? This place actually uses a reachability analysis algorithm. The basic idea of this algorithm is to use a series of objects called "GC Roots" as the starting point, and search downward from this node. The path traveled by the search is called a reference. Chain, when there is no reference chain connecting an object to GC Roots (in graph theory terms, the object is unreachable from GC Roots), it proves that the object is unavailable, and it is determined to be a recyclable object. When do we trigger garbage collection when we already know the objects to be recycled? Safety points are places where the program is temporarily executed to perform GC. From this, we can easily know that the GC pause time is the core of garbage collection. All garbage collection algorithms and derived garbage collectors are all centered around minimizing GC pause times. Now the latest G1 garbage collector can establish a predictable pause time model and plan to avoid full operations in the entire Java heap. Regional garbage collection. When we introduced the concept of memory area distribution earlier, we talked about the new generation and the old generation. Different garbage collectors may act on the new generation or the old generation, and there is even no concept of generation (such as the G1 collector). ), having said that, the following is a detailed introduction to the garbage collection algorithm and the corresponding garbage collector The most basic collection algorithm, the algorithm is divided into two types: mark and clear Stage: First mark all objects to be recycled. After the marking is completed, all marked objects will be recycled uniformly. Its biggest shortcoming is that it is not efficient and generates a large number of discontinuous memory fragments. This causes problems when the program allocates large objects during running. Even if there is enough memory in the heap, it cannot find enough continuous memory. May have to trigger a GC operation. The corresponding garbage collector here is the CMS collector. Having said so much before, maybe you have some idea of the life history of Java Code, or you don’t understand it very well. Here we give an example to review the whole process. What will we experience when we create a new object? Combined with what was said before, when the JVM encounters a new instruction, it first checks whether the entire instruction parameter can locate a symbol reference of a class in the constant pool in the method area, and checks whether the class represented by the entire symbol reference has been loaded and parsed. and initialized, if not, the corresponding class loading process must be executed first. After the class loading check passes, the JVM will next allocate memory for the new object. This process is performed in the heap. The allocation size can be determined after the class loading is completed. If the heap memory is regular, the pointer is used to move the object size. Equal distance is enough. This allocation method is called "pointer collision". If it is scattered, the JVM maintains a list to record which memory is available, allocates and updates the list records, this method is called "free list", as for which method is used , depends on which garbage collector is used for the heap we mentioned earlier. After dividing the object memory, the virtual machine performs necessary initialization operations. Next, the necessary settings need to be made for the object. This information is set in the object header (class metadata information, object hash code, object GC generation age, etc. ), after these tasks are completed, a new object is generated. This is actually not over yet. The next step is to call the Bytecode generation
Class Loading
Program Counter
Stack
Local method stack
Heap
Method area
Direct memory
Loading
Verification
Preparation
Parsing
Initialization
When initializing a class and finding that its parent class has not been initialized, the initialization operation of its parent class will be triggered first. Run
GC
Mark-clear algorithm
Copy algorithm
Mark-Complete Algorithm This algorithm is an algorithm used for garbage collection in the old generation, because the old generation is not recycled as frequently as the copy algorithm, and it also wastes space. The mark-organize process is similar to mark-clear, except that the subsequent steps are not to directly clear the recyclable objects, but to move all surviving objects to one end, and then directly clean up the memory outside the end boundary. The corresponding garbage collectors here are Serial Old collector and Parallel Old collector. Generational collection algorithm Current commercial virtual machines all use this algorithm. Its idea is to divide the heap memory area into generations as we mentioned earlier. The new generation and the old generation are different. Regions use different garbage collection algorithms. The young generation uses the copy algorithm, and the old generation uses the mark-collation or mark-sweep algorithm.
Review
The above is the detailed content of The life history of Java programs. For more information, please follow other related articles on the PHP Chinese website!