Memory management is a crucial component of computer software development, tasked with the effective allocation, utilization, and release of memory in applications. Its importance lies in enhancing software performance and ensuring system stability.
Garbage collection (GC) is pivotal in contemporary programming languages such as Java and Go. It autonomously detects and recycles unused memory, thereby alleviating the need for developers to manually manage memory. The concept of GC originally emerged in the LISP programming language in the late 1950s, marking the introduction of automated memory management.
Key advantages of automated memory management include:
Understanding the nature of "garbage" in memory and identifying reclaimable space is essential. In the upcoming chapters, we will start by exploring the fundamental principles of garbage collection.
The Reference Counting Algorithm assigns a field in the object's header to track its reference count. This count increases with each new reference and decreases when a reference is removed. When the count reaches zero, the object is eligible for garbage collection.
Consider the following code:
First create a String with value demo which is referenced by d (Figure 1).
String d = new String("demo");
Figure 1 – After a String is created
Then, set d to null. The reference count of demo is zero. In the Reference Counting algorithm, the memory for demo is to be reclaimed (Figure 2).
d =null; // Reference count of 'demo' becomes zero, prompting garbage collection.
Figure 2 – When the reference is nullified
The Reference Counting Algorithm operates during program execution, avoiding Stop-The-World events, which halt the program temporarily for garbage collection. However, its major drawback is the inability to handle circular references (Figure 3).
For example:
public class CircularReferenceDemo { public CircularReferenceDemo reference; private String name; public CircularReferenceDemo(String name) { this.name = name; } public void setReference(CircularReferenceDemo ref) { this.reference = ref; } public static void main(String[] args) { CircularReferenceDemo objA = new CircularReferenceDemo("Ref_A"); CircularReferenceDemo objB = new CircularReferenceDemo("Ref_B"); objA.setReference(objB); objB.setReference(objA); objA = null; objB = null; } }
Here, despite nullifying external references, the mutual references between objA and objB prevent their garbage collection.
Figure 3 – Circular References
We can see that both objects can no longer be accessed. However, they are referenced by each other, and thus their reference count will never be zero. Consequently, the GC collector will never be notified to garbage collect them by using the Reference Counting algorithm.
This algorithm is practically implemented in C++ through the use of std::shared_ptr. Designed to manage the lifecycle of dynamically allocated objects, std::shared_ptr automates the increment and decrement of reference counts as pointers to the object are created or destroyed. This smart pointer is part of the C++ Standard Library, providing robust memory management capabilities that significantly diminish the risks associated with manual memory handling. Whenever a std::shared_ptr is copied, the internal reference count of the managed object increases, reflecting the new reference. Conversely, when a std::shared_ptr is destructed, goes out of scope, or is reassigned to a different object, the reference count decreases. The allocated memory is automatically reclaimed and the object is destroyed when its reference count reaches zero,
effectively preventing memory leaks by ensuring no object remains allocated without necessity.
The Reachability Analysis Algorithm begins at GC roots, traversing through the object graph. Objects that cannot be reached from these roots are deemed unrecoverable and are targeted for collection.
As shown in the image below, the objects in blue circle should be kept alive and the objects in gray circle can be recycled (Figure 4).
Figure 4 – Memory leak
This method effectively resolves the issue of circular references inherent in the Reference Counting Algorithm. Objects unreachable from the GC roots are categorized for collection.
Typically, Java objects considered as GC roots include:
GraalVM offers an ahead-of-time (AOT) compiler, which translates Java applications into standalone executable binaries known as GraalVM Native Images. Developed by Oracle Labs, these binaries
encapsulate application and library classes, and runtime components like the GC, allowing operations without a Java Runtime Environment (JRE).
The process involves static analysis to determine reachable components, initialization through executed blocks, and finalizing by creating a snapshot of the application state for subsequent machine code translation.
The Substrate VM stands as an integral part of the GraalVM suite, orchestrated by Oracle Labs. It's an enhanced JVM that not only supports ahead-of-time (AOT) compilation but also facilitates the execution of languages beyond Java, such as JavaScript, Python, Ruby, and even native languages like C and C++. At its core, Substrate VM serves as a sophisticated framework that allows GraalVM to compile Java applications into standalone native binaries. These binaries do not rely on a conventional Java Virtual Machine (JVM) for their execution, which streamlines deployment and
operational processes.
One of the cardinal features of Substrate VM is its specialized garbage collector, which is fine-tuned for applications requiring low latency and minimal memory footprint. This garbage collector is adept at handling the unique memory layout and operational model distinct to native images, which differ considerably from traditional Java applications running on a standard JVM. The absence of a Just-In-Time (JIT) compiler in Substrate VM native images is a strategic choice that aids in minimizing the overall size of the executable. This is because it eliminates the necessity to include the JIT compiler and associated metadata, which are substantial in size and complexity.
Furthermore, while GraalVM is developed using Java, this introduces certain constraints, particularly in terms of native memory access. Such restrictions are primarily due to security concerns and the need to maintain compatibility across various platforms. However, accessing native memory is essential for optimal garbage collection operations. To address this, Substrate VM employs a suite of specialized interfaces that facilitate safe and efficient interactions with native memory. These interfaces are part of the broader GraalVM architecture and enable Substrate VM to effectively manage memory in a manner akin to lower-level languages like C, all while retaining the safety and manageability of Java.
In practice, these capabilities make Substrate VM an extremely versatile tool that enhances the functionality and efficiency of applications compiled with GraalVM. By allowing developers to
leverage a broader range of programming languages and compile them into efficient native binaries, Substrate VM pushes the boundaries of what can be achieved with traditional Java development environments. This makes it an invaluable asset for modern software development projects that demand high performance, reduced resource consumption, and versatile language support.
Noteworthy elements of Substrate VM include:
Simplified memory access via interfaces like Pointer Interface Pointer for raw memory operations and WordBase Interface WordBase for handling word-sized values.
Division of the heap into pre-initialized segments containing immutable objects and runtime segments for dynamic object allocation (Figure 5).
Figure 5 – Memory Management in Native Image
At runtime, the so-called image heap in Substrate VM contains objects created during the image build process. This section of the heap is pre-initialized with data from the executable binary's data section and is readily accessible upon application startup. The objects residing in the image heap are considered immortal; hence, references within these objects are treated as root pointers by the
garbage collector. However, the GC only scans parts of the image heap for root pointers, specifically those that are not marked as read-only.
빌드 프로세스 중에 읽기 전용으로 지정된 개체는 이미지 힙의 특정 읽기 전용 섹션에 배치됩니다. 이러한 객체는 런타임에 할당된 객체에 대한 참조를 절대 보유하지 않으므로 루트 포인터가 포함되어 있지 않으므로 GC가 검색 중에 이를 우회할 수 있습니다. 마찬가지로 기본 데이터나 기본 유형의 배열로만 구성된 객체에도 루트 포인터가 없습니다. 이 속성은 GC 스캔에서 이러한 객체를 생략할 수 있으므로 가비지 수집 프로세스를 더욱 간소화합니다.
반대로, Java 힙은 런타임 중에 동적으로 생성되는 일반 객체를 보관하도록 지정됩니다. 힙의 이 부분은 더 이상 사용되지 않는 객체가 차지하는 공간을 회수하기 위해 정기적인 가비지 수집을 거칩니다. 노화 메커니즘을 갖춘 세대별 힙으로 구성되어 시간이 지남에 따라 효율적인 메모리 관리가 가능합니다.
사전 초기화된 불멸 이미지 힙과 동적으로 관리되는 Java 힙 간의 이러한 구분을 통해 Substrate VM은 애플리케이션 메모리 요구 사항의 정적 및 동적 측면을 모두 충족하면서 메모리 사용 및 가비지 수집 효율성을 최적화할 수 있습니다.
Substrate VM의 힙 모델에서 메모리는 힙 청크로 알려진 구조로 체계적으로 구성됩니다. 일반적으로 기본 크기가 1024KB인 이러한 청크는 객체 스토리지에만 할당되는 연속적인 가상 메모리 세그먼트를 형성합니다. 이러한 청크의 조직 구조는 꼬리 청크가 가장 최근에 추가된 세그먼트를 나타내는 연결 목록입니다. 이런 모델
효율적인 메모리 할당 및 객체 관리를 용이하게 합니다.
이러한 힙 청크는 정렬된 유형과 정렬되지 않은 유형의 두 가지 유형으로 더 분류됩니다. 정렬된 힙 청크는 여러 개체를 연속적으로 보유할 수 있습니다. 이 정렬을 통해
매핑을 더 쉽게 할 수 있습니다.
객체를 각각의 상위 힙 청크에 추가하여 메모리 관리를 더욱 직관적이고 효율적으로 만듭니다. 개체 승격이 필요한 시나리오(일반적으로 가비지 수집 및
중)
메모리 최적화 - 개체가 상위 힙 청크의 원래 위치에서 지정된 "이전 공간"에 있는 대상 힙 청크로 이동됩니다. 이 마이그레이션은 젊은 개체를 오래된 개체와 분리하여 GC 주기 중 오버헤드를 줄여 가비지 수집 프로세스를 최적화하는 데 도움이 되는 세대별 힙 관리 전략의 일부입니다.
GraalVM 네이티브 이미지는 다양한 요구 사항에 맞는 다양한 GC를 지원합니다.
직렬 GC: 단일 스레드 애플리케이션에 적합한 기본 저용량 수집기입니다.
G1 Garbage Collector: 힙 크기가 큰 멀티 스레드 애플리케이션용으로 설계되어 세대 관리의 유연성이 향상되었습니다.
Epsilon GC: 할당을 처리하지만 회수가 부족한 최소한의 수집기로서, 전체 힙 활용이 예측 가능한 단기 애플리케이션에 가장 적합합니다.
결론적으로 Substrate VM은 특수 가비지 수집 및 구조화된 힙 관리와 같은 고급 기술을 통합하여 GraalVM 내에서 메모리 관리를 효과적으로 최적화합니다. 힙 청크와 이미지 및 Java 힙을 위한 별도의 메모리 세그먼트를 포함한 이러한 기능은 가비지 수집을 간소화하고 애플리케이션 성능을 향상시킵니다. Substrate VM은 다양한 프로그래밍 언어를 지원하고 이를 효율적인 기본 바이너리로 컴파일하므로 최신 JVM 프레임워크가 기존 경계를 넘어 확장하여 다양한 애플리케이션 환경에서 실행 효율성과 견고성을 향상시킬 수 있는 방법을 보여줍니다. 이러한 접근 방식은 가상 머신 기술 및 애플리케이션 배포의 향후 개발을 위한 높은 표준을 설정합니다.
The above is the detailed content of Memory Management in GraalVM Native Image. For more information, please follow other related articles on the PHP Chinese website!