English original text: Maoni Stephens, compiled by: Zhao Yukai (@玉开Sir)
The CLR garbage collector divides objects according to the size of the space they occupy. There is a big difference in how large objects and small objects are handled. For example, memory defragmentation - moving large objects in memory is expensive. Let's study how the garbage collector handles large objects and what potential impact large objects have on program performance.
Large Object Heap and Garbage Collection
In .Net 1.0 and 2.0, if the size of an object exceeds 85000byte, it is considered A large object. This number is based on experience with performance optimization. When the memory size requested by an object reaches this threshold, it will be allocated on the large object heap. What does this mean? To understand this, we need to understand the .Net garbage collection mechanism.
As most people know, .Net GC collects objects based on "generations". There are three generations of objects in the program, generation 0, generation 1 and generation 2. Generation 0 is the youngest object, and generation 2 objects have the longest survival time. GC collects garbage by generation for performance reasons; usually objects will be recycled in generation 0. For example, in an ASP.NET program, objects associated with each request should be recycled at the end of the request. Objects that have not been recycled will become generation 1 objects; that is to say, generation 1 objects are a buffer between resident memory objects and objects that are about to die.
From a generational perspective, large objects belong to generation 2 objects, because large objects are only processed during generation 2 recycling. When a certain generation of garbage collection is executed, the garbage collection of the younger generation will be executed at the same time. For example: when the 1st generation garbage collection is performed, the objects of the 1st generation and the 0th generation will be recycled at the same time. When the 2nd generation garbage collection is performed, the collection of the 1st generation and the 0th generation will be performed. The
generation is where the garbage collector distinguishes memory areas. Logical view. From a physical storage perspective, objects are allocated on different managed heaps. A managed heap is a memory area allocated by the garbage collector from the operating system (by calling the Windows API VirtualAlloc). When the CLR loads memory, it initializes two managed heaps, a large object heap (LOH – large object heap) and a small object pair (SOH – small object heap).
The memory allocation request is to place the managed object on the corresponding managed heap. If the size of the object is less than 85000 bytes, it will be placed in SOH; otherwise, it will be placed in LOH.
For SOH, the object will enter the next generation after performing a garbage collection. That is to say, if the surviving object will enter the second generation when garbage collection is performed for the first time, if the object is still not garbage collected after the second garbage collection, it will become a second-generation object; The 2nd generation object is the oldest object and will not increase the generation.
When garbage collection is triggered, the garbage collector will defragment the small object heap and move the surviving objects together. As for the large object heap, due to the high cost of moving memory, the CLR team chose to just clear them and form a list of recycled objects to meet the next large object request to use memory. Adjacent garbage objects will be merged into A free block of memory.
It should always be noted that until .Net 4.0, the large object heap will not be defragmented, but it may be done in the future. So if you want to allocate large objects and don't want them to be moved, you can use the fixed statement.
The following is a schematic diagram of the recycling of the small object heap SOH
When should large objects be recycled?
Before discussing when to recycle large objects, let’s take a look at when ordinary garbage collection operations are performed. Garbage collection occurs under the following circumstances:
1. The requested space exceeds the memory size of generation 0 or the threshold of the large object heap. Most managed heap garbage collection occurs in this case
2 . When the GC.Collect method is called in the program code; if the GC.MaxGeneration parameter is passed in when the GC.Collect method is called, garbage collection of all generation objects will be performed, including garbage collection of the large object heap
3. When the operating system has insufficient memory, when the application receives a high memory notification from the operating system
4. If the garbage collection algorithm believes that second-generation recycling is effective, it will trigger second-generation garbage collection
5. Each generation of object heap has an attribute that occupies a space size threshold. When you allocate objects to a certain generation, you increase the total amount of memory close to the threshold of that generation, or allocate objects that cause this generation to When the heap size exceeds the heap threshold, a garbage collection will occur. Therefore, when you allocate small objects or large objects, it will consume the threshold of the generation 0 heap or the large object heap. When the garbage collector increases the object generation to generation 1 or 2, the threshold of generations 1 and 2 will be consumed. These thresholds change dynamically while the program is running.
Performance impact of large object heap
Let us first look at the cost of allocating large objects. When the CLR allocates memory for each new object, it must ensure that the memory is cleared and not used by other objects (I give out is cleared). This means that the cost of allocation is completely controlled by the cost of clearing (unless a garbage collection is triggered during allocation). If it takes 2 cycles to clear 1 byte, it means that it takes 170,000 cycles to clear a smallest large object. Normally people do not allocate very large objects. For example, allocating a 16M object on a 2GHz machine takes about 16ms to clear the memory. The price is too high.
Let’s take a look at the cost of recycling. As mentioned earlier, large objects are recycled together with 2-generation objects. If the space occupied by a large object or a second-generation object exceeds its threshold, the recycling of the second-generation object will be triggered. If generation 2 recycling is triggered because the large object heap exceeds the threshold, there are not many objects in the generation 2 object heap itself that can be recycled. This is not a big problem if there are not many objects on the 2nd generation heap. However, if the second-generation heap is large and has many objects, excessive second-generation recycling will cause performance problems. If you allocate large objects temporarily, it will take a lot of time to run garbage collection; that is, if you continue to use large objects and then release the large objects, it will have a great negative impact on performance.
Huge objects on the large object heap are usually arrays (it is rare that one object is very large). If the elements in the object are strong references, the cost will be very high; if there are no mutual references between elements, there is no need to traverse the entire array during garbage collection. For example: use an array to save the nodes of a binary tree. One way is to strongly reference the left and right nodes in the node:
class Node { Data d; Node left; Node right; } Node[] binaryTree = new Node[num_nodes];
If num_nodes is a large number, it means that each node has at least There are two reference elements that need to be viewed. An alternative is to save the array index numbers of the left and right node elements in the node
class Node { Data d; uint left_index; uint right_index; }
In this case, the reference relationship between the elements is removed; you can use binaryTree [left_index] to get the referenced node. The garbage collector no longer needs to look at related reference elements when doing garbage collection.
Collecting performance data for large object heaps
There are several ways to collect performance data related to large object heaps. Before I explain these methods, let's talk about why you need to collect performance data related to large object heaps.
When you start to collect performance data in a certain aspect, it is possible that you have already found evidence of a performance bottleneck in this aspect; or you have not searched all aspects and found no problem.
The .Net CLR Memory performance counters are usually the first tool you should consider when looking for performance problems. Counters related to LOH include generation 2 collectioins (number of generation 2 heap collections) and large object heap size. Generation 2 collections shows the number of generation 2 garbage collection operations that have occurred since the process was started. The Large object heap size counter displays the current size of the large object heap, including free space; this counter is updated after each garbage collection operation, not every time memory is allocated.
You can refer to the figure below to observe .Net CLR Memory related performance data in the windows performance counter
You can also query the values of these counters through programs; many people collect performance counters through programs to help find performance bottlenecks.
Of course, you can also use the debugger winddbg to observe the large object heap.
Final reminder: So far, the large object heap is not defragmented as part of garbage collection, but this is just an implementation detail of clr, and program code should not rely on this feature. If you want to ensure that the object will not be moved by the garbage collector, use the fixed statement.
Original address: http://www.php.cn/
## The above is the .Net garbage collection and Regarding the content of large object processing, please pay attention to the PHP Chinese website (www.php.cn) for more related content!