php Xiaobian Yuzi introduces you to a technique to optimize memory usage - releasing memory from large objects. During the development process, we often create some large objects, such as large arrays or large database query results, and these objects take up a lot of memory resources. When we are done using these objects, it is a good programming habit to release the memory in time. This article will show you how to free memory from large objects to improve application performance and efficiency.
I encountered something that I don’t understand. Hope you all can help!
resource:
I read in a few articles that we could simplify by setting large slices and maps (I guess this applies to all reference types) to nil
after we no longer need them gc job. Here's one of the examples I've read:
func ProcessResponse(resp *http.Response) error { data, err := ioutil.ReadAll(resp.Body) if err != nil { return err } // Process data here data = nil // Release memory return nil }
From what I understand, when the function processresponse
completes, the data
variable will go out of scope and basically cease to exist. The gc will then verify that there are no references to the []byte
slice (the slice pointed to by data
) and will clear the memory.
Setting data
to nil
How to improve garbage collection?
Thanks!
data = nil
before returning does not change anything on the gc side. The go compiler will apply optimizations and golang's garbage collector works in different stages. In the simplest terms (with many omissions and oversimplifications): setting data = nil
, and removing all references to the underlying slice does not trigger an atomic-style release of memory that is no longer referenced. Once a slice is no longer referenced, it is marked as such and the associated memory is not released until the next scan.
Garbage collection is a difficult problem, largely because it is not the kind of problem that has an optimal solution that produces the best results for all use cases. The go runtime has evolved a lot over the years, and the important work is done on the runtime garbage collector. The result is that, in rare cases, a simple somevar = nil
will make even a small, let alone noticeable, difference.
If you're looking for some simple rule-of-thumb type hints that may affect the runtime overhead associated with garbage collection (or runtime memory management in general), I do know that this sentence seems to vaguely cover one In your question:
It is suggested that we can simplify the work of gc by setting up large slices and mappings
This can produce significant results when analyzing code. Assuming you're reading a large amount of data that needs to be processed, or you have to do some other kind of batch operation and return slices, it's not uncommon for people to write something like this:
func processstuff(input []sometypes) []resulttypes { data := []resulttypes{} for _, in := range input { data = append(data, processt(in)) } return data }
Can be easily optimized by changing the code to:
func processstuff(input []sometypes) []resulttypes { data := make([]resulttypes, 0, len(input)) // set cap for _, in := range input { data = append(data, processt(in)) } return data }
What happens in the first implementation is that you create a slice for 0 using len
and cap
. The first time you call append
, you exceed the slice's current capacity, which will cause the runtime to allocate memory. As explained here, the calculation of the new capacity is quite simple, the memory is allocated and the data is allocated and copied:
t := make([]byte, len(s), (cap(s)+1)*2) copy(t, s)
Essentially, every time you call append
when the slice to be appended is full (i.e. len
== cap
), you will allocate an available Holds: (len 1) * 2
new slice of elements. Knowing that in the first example data
starts with len
and cap
== 0, let's see what this means:
1st iteration: append creates slice with cap (0+1) *2, data is now len 1, cap 2 2nd iteration: append adds to data, now has len 2, cap 2 3rd iteration: append allocates a new slice with cap (2 + 1) *2, copies the 2 elements from data to this slice and adds the third, data is now reassigned to a slice with len 3, cap 6 4th-6th iterations: data grows to len 6, cap 6 7th iteration: same as 3rd iteration, although cap is (6 + 1) * 2, everything is copied over, data is reassigned a slice with len 7, cap 14
If the data structures in the slice are large (i.e. many nested structures, lots of indirection, etc.), then this frequent reallocation and copying can become quite expensive. If your code contains a lot of these loops, it will start to show up in pprof (you'll start seeing a lot of time spent calling gcmalloc
). Additionally, if you were processing 15 input values, your data slice would end up looking like this:
dataslice { len: 15 cap: 30 data underlying_array[30] }
This means that you will allocate memory for 30 values when you only need 15, and you will allocate that memory into 4 progressively larger chunks, copying the data on each reallocation.
In contrast, the second implementation will allocate a data piece like this before the loop:
data { len: 0 cap: 15 data underlying_array[15] }
It is allocated once, so no reallocation and copying are required, and the returned slice will occupy half of the memory space. In this sense, we first allocate larger memory blocks at the beginning to reduce the number of incremental allocation and copy calls required later, which overall reduces the runtime cost.
这是一个公平的问题。这个例子并不总是适用。在这种情况下,我们知道需要多少个元素,并且可以相应地分配内存。有时,世界并不是这样运作的。如果您不知道最终需要多少数据,那么您可以:
不,将一个简单的切片变量设置为 nil 在 99% 的情况下不会产生太大影响。创建和附加到地图/切片时,更可能产生影响的是通过使用 make()
+ 指定合理的 cap
值来减少无关分配。其他可以产生影响的事情是使用指针类型/接收器,尽管这是一个需要深入研究的更复杂的主题。现在,我只想说,我一直在开发一个代码库,该代码库必须对远远超出典型 uint64
范围的数字进行操作,不幸的是,我们必须能够以更精确的方式使用小数比 float64
将允许。我们通过使用像 holiman/uint256 这样的东西解决了 uint64
问题,它使用指针接收器,并解决shopspring/decimal 的十进制问题,它使用值接收器并复制所有内容。在花费大量时间优化代码之后,我们已经达到了使用小数时不断复制值的性能影响已成为问题的地步。看看这些包如何实现加法等简单操作,并尝试找出哪个操作成本更高:
// original a, b := 1, 2 a += b // uint256 version a, b := uint256.NewUint(1), uint256.NewUint(2) a.Add(a, b) // decimal version a, b := decimal.NewFromInt(1), decimal.NewFromInt(2) a = a.Add(b)
这些只是我在最近的工作中花时间优化的几件事,但从中得到的最重要的一点是:
当您处理更复杂的问题/代码时,您需要花费大量精力来研究切片或映射的分配周期,因为潜在的瓶颈和优化需要付出很大的努力。您可以而且可以说应该采取措施避免过于浪费(例如,如果您知道所述切片的最终长度是多少,则设置切片上限),但您不应该浪费太多时间手工制作每一行,直到该代码的内存占用尽可能小。成本将是:代码更脆弱/更难以维护和阅读,整体性能可能会恶化(说真的,你可以相信 go 运行时会做得很好),大量的血、汗和泪水,以及急剧下降在生产力方面。
The above is the detailed content of Release memory from large object. For more information, please follow other related articles on the PHP Chinese website!