Detailed explanation of the use of garbage collector-JS Tutorial-php.cn

This time I will bring you a detailed explanation of the use of the garbage collector. What are the precautions when using the garbage collector? The following is a practical case, let's take a look.

The garbage collector is a double-edged sword. The advantage is that it can greatly simplify the memory management code of the program, because memory management does not require programmers to operate, thereby reducing (but not eradicating) memory leaks in long-running programs. For some programmers, it can even improve the performance of their code.

On the other hand, choosing a garbage collector means that the program cannot fully control the memory, and this is the crux of mobile terminal development. For JavaScript, there is no possibility of memory management in the program - the ECMAScript standard does not expose any garbage collector interface. Web applications have no way to manage memory or prompt the garbage collector.

Strictly speaking, languages that use garbage collectors are not necessarily better or worse in performance than languages that do not use garbage collectors. In C language, allocating and releasing memory may be very expensive operations. In order to enable the allocated memory to be released in the future, heap management will tend to be complicated. In managed memory languages, allocating memory often just adds a pointer. But then we will see the huge cost of the garbage collector stepping in to collect when the memory is exhausted. An unexplored garbage collector will cause long and unpredictable pauses in the running of the program, which directly affects the experience of using interactive systems (especially those with animation effects). Reference counting systems are often touted as an alternative to garbage collection, but they can also cause unexpected pauses when the last object in a large subgraph is dereferenced. Moreover, the reference counting system will also have considerable performance burden when frequently performing read, rewrite, and store operations.

For better or worse, JavaScript needs a garbage collector. V8's garbage collector implementation is now mature, with excellent performance, short pauses, and very controllable performance burden.

Basic concepts

The most basic problem that the garbage collector needs to solve is to identify the memory that needs to be recycled. Once identified, these memory areas can be reused in future allocations or returned to the operating system. An object is dead (nonsense) when it is not active. An object is active if and only if it is pointed to by a root object or another active object. The root object is defined as the active object and is the object referenced by the browser or V8. For example, objects pointed to by local variables belong to the root object because their stack is considered the root object; global objects belong to the root object because they are always accessible; browser objects, such as DOM elements, also belong to the root object. Although in some cases they are only weak references.

From the side, the above definition is very loose. In fact we can say that an object is active when it can be referenced by a program. For example:

function f() {
	 var obj = {x: 12};
	 g(); // 可能包含一个死循环
	 return obj.x;
	}

Copy after login

def scavenge():
	 swap(fromSpace, toSpace)
	 allocationPtr = toSpace.bottom
	 scanPtr = toSpace.bottom
	 for i = 0..len(roots):
	 root = roots[i]
	 if inFromSpace(root):
	  rootCopy = copyObject(&allocationPtr, root)
	  setForwardingAddress(root, rootCopy)
	  roots[i] = rootCopy
	 while scanPtr < allocationPtr:
	 obj = object at scanPtr
	 scanPtr += size(obj)
	 n = sizeInWords(obj)
	 for i = 0..n:
	  if isPointer(obj[i]) and not inOldSpace(obj[i]):
	  fromNeighbor = obj[i]
	  if hasForwardingAddress(fromNeighbor):
	   toNeighbor = getForwardingAddress(fromNeighbor)
	  else:
	   toNeighbor = copyObject(&allocationPtr, fromNeighbor)
	   setForwardingAddress(fromNeighbor, toNeighbor)
	  obj[i] = toNeighbor
	def copyObject(*allocationPtr, object):
	 copy = *allocationPtr
	 *allocationPtr += size(object)
	 memcpy(copy, object, size(object))
	 return copy

Copy after login

During the execution of this algorithm, we always maintain two pointers in the out area: allocationPtr points to the place where we are about to allocate memory for a new object, and scanPtr points to the next one where we are about to perform an active check. object. The objects before the address pointed to by scanPtr are processed objects. They and their neighbors are all in the out area, and their pointers have been updated. The objects between scanPtr and allocationPtr will be copied to the out area, but within these objects If the contained pointers point to objects in the zone, these objects in the zone will not be copied. Logically, you can think of the objects between scanPtr and allocationPtr as a queue of objects used for breadth-first search.

Translation Note: In breadth-first search, nodes are usually taken out from the head of the queue and expanded, and the expanded child nodes are stored at the end of the queue, and the cycle continues. This process is similar to updating an object between two pointers.

At the beginning of the algorithm, we copy all objects in the new area that can be reached from the root object, and then enter a large loop. On each iteration of the loop, we remove an object from the queue, increment scanPtr, and then track the pointer accessing the object. If the pointer does not point to the entry area, ignore it, because it must point to the old area, and this is not our goal. And if the pointer points to an object in the incoming area, but we have not copied it (the forwarding address has not been set), then copy this object to the outgoing area, that is, add it to the end of our queue, and at the same time, to allocationPtr increment. At this time, we will also store a forwarding address to the first word of the out-zone object and replace the Map pointer. This forwarding address is the address where the object is stored after being copied. The garbage collector can easily distinguish the forwarding address from the Map pointer because the Map pointer is marked, but this address is not. If we find a pointer and the object it points to has been copied (the forwarding address has been set), we update the pointer to the forwarding address and mark it.

The algorithm terminates when all objects have been processed (that is, scanPtr and allocationPtr meet). At this time, the content entered into the zone can be regarded as garbage and may be released or reused in the future.

Secret Weapon: Write Barrier

One detail was ignored above: if an object in the new area has only one pointer to it, and this pointer happens to Among the objects in the old student area, how can we know which object in the new student area is active? Obviously we don't want to traverse the old raw area again, because there are many objects in the old raw area, and doing so once will consume too much.

In order to solve this problem, there is actually a list in the write buffer, which records all the objects in the old area pointing to the new area. When a new object is born, there will be no pointer to it. When an object in the old area has a pointer to an object in the new area, we will record such a cross-area pointer. Since this logging behavior always occurs during a write operation, it is called a write barrier—because every write operation must go through such a barrier.

You may be curious, if every write operation has to go through the write barrier, wouldn't there be a lot of extra code? Yes, this is one of the costs of our garbage collection mechanism. But the situation is not as serious as you think. After all, there are fewer write operations than read operations. Some garbage collection algorithms (not V8) will use read barriers, which require hardware assistance to ensure a lower cost. V8 also has some optimizations to reduce the consumption caused by write barriers:

Most of the script execution time occurs in Crankshaft, and Crankshaft can often statically determine whether an object is in the new area. Write operations to these objects do not require a write barrier.

There is a new optimization in Crankshaft, that is, when the object does not have a non-local reference pointing to it, the object will be allocated on the stack. The write operation related to an object on the stack obviously does not require a write barrier. (Annotation: The new student area and the old student area are on the heap.)

The situation of "old → new" is relatively rare, so by combining the two common situations of "new → new" and "old → old" Code optimization can relatively improve performance in most situations. Each page is aligned to 1MB, so given the memory address of an object, the page where it is located can be quickly located by filtering out the lower 20 bits; and the page header has relevant identifiers to indicate whether it belongs to the new area or the old area, so by Determining the area to which two objects belong can also quickly determine whether they are "old → new".

Once we find the "old → new" pointer, we can record it at the end of the write buffer. After a certain amount of time (when the write buffer is full), we sort it, merge identical items, and then remove pointers that no longer fit the "old → new" situation. (Annotation: In this way, the number of pointers will be reduced, and the time to write the barrier will be shortened accordingly)

"Mark-Clear" algorithm and "Mark-Compact" algorithm

Scavenge algorithm is for fast recycling , Compressing small pieces of memory works well, but it consumes too much for large pieces of memory. Because the Scavenge algorithm requires two areas, the outbound area and the inbound area, this is acceptable for small pieces of memory, but becomes impractical for memory exceeding a few MB. The old student area contains hundreds of MB of data. For such a large area, we adopt two other algorithms that are relatively close to each other: the "mark-clear" algorithm and the "mark-compact" algorithm.

Both algorithms include two phases: the marking phase and the cleaning or compaction phase.

在标记阶段，所有堆上的活跃对象都会被标记。每个页都会包含一个用来标记的位图，位图中的每一位对应页中的一字（译注：一个指针就是一字大小）。这个标记非常有必要，因为指针可能会在任何字对齐的地方出现。显然，这样的位图要占据一定的空间（32位系统上占据3.1%，64位系统上占据1.6%），但所有的内存管理机制都需要这样占用，因此这种做法并不过分。除此之外，另有2位来表示标记对象的状态。由于对象至少有2字长，因此这些位不会重叠。状态一共有三种：如果一个对象的状态为白，那么它尚未被垃圾回收器发现；如果一个对象的状态为灰，那么它已被垃圾回收器发现，但它的邻接对象仍未全部处理完毕；如果一个对象的状态为黑，则它不仅被垃圾回收器发现，而且其所有邻接对象也都处理完毕。

如果将堆中的对象看作由指针相互联系的有向图，标记算法的核心实际是深度优先搜索。在标记的初期，位图是空的，所有对象也都是白的。从根可达的对象会被染色为灰色，并被放入标记用的一个单独分配的双端队列。标记阶段的每次循环，GC会将一个对象从双端队列中取出，染色为黑，然后将它的邻接对象染色为灰，并把邻接对象放入双端队列。这一过程在双端队列为空且所有对象都变黑时结束。特别大的对象，如长数组，可能会在处理时分片，以防溢出双端队列。如果双端队列溢出了，则对象仍然会被染为灰色，但不会再被放入队列（这样他们的邻接对象就没有机会再染色了）。因此当双端队列为空时，GC仍然需要扫描一次，确保所有的灰对象都成为了黑对象。对于未被染黑的灰对象，GC会将其再次放入队列，再度处理。

以下是标记算法的伪码：

markingDeque = []
	overflow = false
	def markHeap():
	 for root in roots:
	 mark(root)
	 do:
	 if overflow:
	  overflow = false
	  refillMarkingDeque()
	 while !markingDeque.isEmpty():
	  obj = markingDeque.pop()
	  setMarkBits(obj, BLACK)
	  for neighbor in neighbors(obj):
	  mark(neighbor)
	 while overflow
	 
	def mark(obj):
	 if markBits(obj) == WHITE:
	 setMarkBits(obj, GREY)
	 if markingDeque.isFull():
	  overflow = true
	 else:
	  markingDeque.push(obj)
	def refillMarkingDeque():
	 for each obj on heap:
	 if markBits(obj) == GREY:
	  markingDeque.push(obj)
	  if markingDeque.isFull():
	  overflow = true
	  return

Copy after login

标记算法结束时，所有的活跃对象都被染为了黑色，而所有的死对象则仍是白的。这一结果正是清理和紧缩两个阶段所期望的。

标记算法执行完毕后，我们可以选择清理或是紧缩，这两个算法都可以收回内存，而且两者都作用于页级（注意，V8的内存页是1MB的连续内存块，与虚拟内存页不同）。

清理算法扫描连续存放的死对象，将其变为空闲空间，并将其添加到空闲内存链表中。每一页都包含数个空闲内存链表，其分别代表小内存区（<256字）、中内存区（<2048字）、大内存区（<16384字）和超大内存区（其它更大的内存）。清理算法非常简单，只需遍历页的位图，搜索连续的白对象。空闲内存链表大量被scavenge算法用于分配存活下来的活跃对象，但也被紧缩算法用于移动对象。有些类型的对象只能被分配在老生区，因此空闲内存链表也被它们使用。

紧缩算法会尝试将对象从碎片页（包含大量小空闲内存的页）中迁移整合在一起，来释放内存。这些对象会被迁移到另外的页上，因此也可能会新分配一些页。而迁出后的碎片页就可以返还给操作系统了。迁移整合的过程非常复杂，因此我只提及一些细节而不全面讲解。大概过程是这样的。对目标碎片页中的每个活跃对象，在空闲内存链表中分配一块其它页的区域，将该对象复制至新页，并在碎片页中的该对象上写上转发地址。迁出过程中，对象中的旧地址会被记录下来，这样在迁出结束后V8会遍历它所记录的地址，将其更新为新的地址。由于标记过程中也记录了不同页之间的指针，此时也会更新这些指针的指向。注意，如果一个页非常“活跃”，比如其中有过多需要记录的指针，则地址记录会跳过它，等到下一轮垃圾回收再进行处理。

增量标记与惰性清理

你应该想到了，当一个堆很大而且有很多活跃对象时，标记-清除和标记-紧缩算法会执行的很慢。起初我研究V8时，垃圾回收所引发的500-1000毫秒的停顿并不少见。这种情况显然很难接受，即使是对于移动设备。

2012年年中，Google引入了两项改进来减少垃圾回收所引起的停顿，并且效果显著：增量标记和惰性清理。

Incremental marking allows marking of the heap to occur in several small pauses of 5-10 milliseconds (mobile devices). Incremental marking is enabled when the heap size reaches a certain threshold. After being enabled, whenever a certain amount of memory is allocated, the execution of the script will be paused and an incremental marking will be performed. Just like ordinary marking, incremental marking is also a depth-first search and uses the same white-gray-black mechanism to classify objects.

But the difference between incremental markers and ordinary markers is that the graph relationship of the object may change! What we need to pay special attention to is the new pointers pointing from the black object to the white object. Recall that a black object means that it has been completely scanned by the garbage collector and will not be scanned again. Therefore, if a pointer like "black → white" appears, we may miss the white object and treat it as a dead object by mistake. (Annotation: The remaining white objects after the marking process are considered dead objects.) So we had to enable the write barrier again. Now the write barrier not only records the "old → new" pointer, but also records the "black → white" pointer. Once such a pointer is found, the black object will be recolored into a gray object and put back into the deque. When the algorithm takes out the object, the pointers it contains are rescanned so that active white objects are not missed.

After the incremental marking is completed, lazy cleanup begins. All objects have been processed, so it is either dead or alive, and it is a foregone conclusion how much space on the heap can become free. At this time, we can not rush to release those spaces, and it is not a big deal to delay the cleaning process. Therefore, there is no need to clean all pages at once. The garbage collector will clean them one by one as needed until all pages are cleaned. At this time, the incremental mark is ready to go again.

Google has also recently added support for parallel cleaning. Since the script's execution thread will no longer touch dead objects, the page cleanup task can be performed in a separate thread with minimal synchronization work. The same support is being worked on for parallel marking, but is currently in an early experimental stage.

Summary

Garbage collection is really complicated. I've skipped a lot of details in the article and it still got very long. A colleague of mine said that he felt that working on the garbage collector was scarier than register allocation, and I said that it was indeed the case. In other words, I would rather leave these tedious details to the runtime than leave it to all application developers. Although garbage collection has some performance issues and occasional weirdness, it frees us from a lot of details so that we can focus on more important things.

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!