Table of Contents
What does the memory distribution of a Linux process look like? " >What does the memory distribution of a Linux process look like?
How does malloc allocate memory? " >How does malloc allocate memory?
How much virtual memory will malloc(1) allocate? " >How much virtual memory will malloc(1) allocate?
Why not all use mmap to allocate memory? " >Why not all use mmap to allocate memory?
Since brk is so awesome, why not use brk for all allocation? " >Since brk is so awesome, why not use brk for all allocation?
The free() function only passes in a memory address. Why can it know how much memory to release? " >The free() function only passes in a memory address. Why can it know how much memory to release?
Home System Tutorial LINUX Understand Linux memory allocation strategy in one article

Understand Linux memory allocation strategy in one article

Feb 12, 2024 am 11:57 AM
linux linux tutorial linux system linux command shell script overflow standard library embeddedlinux Getting started with linux linux learning

What does the memory distribution of a Linux process look like?

In the Linux operating system, the interior of the virtual address space is divided into kernel space and user space. Systems with different digits have different address space ranges. For example, the most common 32-bit and 64-bit systems are as follows:

一文读懂 Linux 内存分配策略

It can be seen from here:

  • The kernel space of a 32-bit system occupies 1G, which is at the top, and the remaining 3G is user space;
  • The kernel space and user space of 64-bit systems are both 128T, occupying the highest and lowest parts of the entire memory space respectively, and the remaining middle part is undefined.

Let’s talk about the difference between kernel space and user space:

  • When the process is in user mode, it can only access user space memory;
  • Only after entering the kernel state can the memory in the kernel space be accessed;

Although each process has its own independent virtual memory, The kernel address in each virtual memory is actually associated with the same physical memory. In this way, after the process switches to the kernel state, it can easily access the kernel space memory.

一文读懂 Linux 内存分配策略

Next, let’s learn more about the division of virtual space. User space and kernel space are divided in different ways. I won’t say much about the distribution of kernel space.

Let’s take a look at the distribution of user space. Taking the 32-bit system as an example, I drew a picture to show their relationship:

You can see from this picture that user space memory is divided into 6 different memory segments from low to high:

一文读懂 Linux 内存分配策略
  • Program file segment, including binary executable code;
  • Initialized data segment, including static constants;
  • Uninitialized data segments, including uninitialized static variables;
  • The heap segment, including dynamically allocated memory, starts from the low address and grows upward;
  • File mapping segments, including dynamic libraries, shared memory, etc., start from low addresses and grow upward (depending on the hardware and kernel version);
  • Stack segment, including local variables and function call context, etc. The stack size is fixed, typically 8 MB. Of course, the system also provides parameters so that we can customize the size;

Among these 6 memory segments, the memory of the heap and file mapping segments is dynamically allocated. For example, using malloc() or mmap() of the C standard library, you can dynamically allocate memory in the heap and file mapping segments respectively.

How does malloc allocate memory?

Actually, malloc() is not a system call, but a function in the C library for dynamically allocating memory.

malloc When applying for memory, there are two ways to apply for heap memory from the operating system.

  • Method 1: Allocate memory from the heap through the brk() system call
  • Method 2: Allocate memory in the file mapping area through the mmap() system call;

The implementation of method one is very simple, which is to use the brk() function to move the "top of heap" pointer to a high address to obtain new memory space. As shown below:

一文读懂 Linux 内存分配策略

Method 2 uses the "private anonymous mapping" method in the mmap() system call to allocate a piece of memory in the file mapping area, which is to "steal" a piece of memory from the file mapping area. As shown below:

一文读懂 Linux 内存分配策略

Under what circumstances does malloc() allocate memory through brk()? In what scenario is memory allocated through mmap()?

malloc() has a threshold defined by default in the source code:

  • If the memory allocated by the user is less than 128 KB, apply for memory through brk();
  • If the memory allocated by the user is greater than 128 KB, apply for memory through mmap();

Note that different glibc versions define different thresholds.

malloc() allocates physical memory?

No, malloc() allocates virtual memory.

If the allocated virtual memory is not accessed, the virtual memory will not be mapped to physical memory, so it will not occupy physical memory.

Only when accessing the allocated virtual address space, the operating system searches the page table and finds that the page corresponding to the virtual memory is not in the physical memory. It will trigger a page fault interrupt, and then the operating system will establish the virtual memory and physical memory. The mapping relationship between memories.

How much virtual memory will malloc(1) allocate?

When

malloc() allocates memory, it does not allocate the memory space according to the number of bytes expected by the user, but will pre-allocate a larger space as a memory pool.

The specific amount of space that will be pre-allocated is related to the memory manager used by malloc. We will use the default memory manager of malloc (Ptmalloc2) to analyze.

Next, let’s do an experiment and use the following code to see how much memory space is actually allocated by the operating system when applying for 1 byte of memory through malloc.

#include 
#include 

int main() {
  printf("使用cat /proc/%d/maps查看内存分配\n",getpid());
  
  //申请1字节的内存
  void *addr = malloc(1);
  printf("此1字节的内存起始地址:%x\n", addr);
  printf("使用cat /proc/%d/maps查看内存分配\n",getpid());
 
  //将程序阻塞,当输入任意字符时才往下执行
  getchar();

  //释放内存
  free(addr);
  printf("释放了1字节的内存,但heap堆并不会释放\n");
  
  getchar();
  return 0;
}
Copy after login

Execution code (Let me explain in advance that the version of the glibc library I use is 2.17):

一文读懂 Linux 内存分配策略

We can view the memory distribution of the process through the /proc//maps file. I filter out the range of memory addresses in the maps file by this 1-byte memory starting address.

[root@xiaolin ~]# cat /proc/3191/maps | grep d730
00d73000-00d94000 rw-p 00000000 00:00 0                                  [heap]
Copy after login

The memory allocated in this example is less than 128 KB, so the memory is applied to the heap space through the brk() system call, so you can see the [heap] mark on the far right.

As you can see, the memory address range of the heap space is 00d73000-00d94000, and the size of this range is 132KB, which means that malloc(1) actually pre-allocates 132K bytes of memory.

Some students may have noticed that the starting address of the memory printed in the program is d73010, and the maps file shows that the starting address of the heap memory space is d73000. Why is there an extra 0x10 (16 bytes)? Let’s leave this question aside for now and will talk about it later.

#free will release the memory and return it to the operating system?

Let’s execute the process above to see if the heap memory is still there after the memory is released through the free() function?

一文读懂 Linux 内存分配策略

As you can see from the picture below, after freeing the memory, the heap memory still exists and has not been returned to the operating system.

一文读懂 Linux 内存分配策略

This is because instead of releasing this 1 byte to the operating system, it is better to cache it and put it into the memory pool of malloc. When the process applies for 1 byte of memory again, it can be reused directly. This speed Much faster.

Of course, when the process exits, the operating system will reclaim all the resources of the process.

The heap memory still exists after free memory mentioned above is for the memory applied by malloc through brk() method.

If malloc applies for memory through mmap, it will be returned to the operating system after free releases the memory.

Let's do an experiment to verify that we apply for 128 KB of memory through malloc, so that malloc allocates memory through mmap.

#include 
#include 

int main() {
  //申请1字节的内存
  void *addr = malloc(128*1024);
  printf("此128KB字节的内存起始地址:%x\n", addr);
  printf("使用cat /proc/%d/maps查看内存分配\n",getpid());

  //将程序阻塞,当输入任意字符时才往下执行
  getchar();

  //释放内存
  free(addr);
  printf("释放了128KB字节的内存,内存也归还给了操作系统\n");

  getchar();
  return 0;
}
Copy after login

Execution code:

一文读懂 Linux 内存分配策略

Looking at the memory distribution of the process, you can find that there is no [head] mark on the far right, indicating that the anonymous memory is allocated from the file mapping area through mmap through anonymous mapping.

一文读懂 Linux 内存分配策略

Then let’s release this memory and see:

一文读懂 Linux 内存分配策略

Check the starting address of the 128 KB memory again and find that it no longer exists, indicating that it has been returned to the operating system.

一文读懂 Linux 内存分配策略

As for the question "Will the memory requested by malloc and released by free be returned to the operating system?", we can make a summary:

  • When malloc applies for memory through brk(), when free releases the memory, will not return the memory to the operating system, but will cache it in malloc's memory pool until it is used next time
  • When free releases the memory allocated by malloc through mmap(), will return the memory to the operating system, and the memory will be truly released .

Why not all use mmap to allocate memory?

Because applying for memory from the operating system requires a system call. To execute a system call, you need to enter the kernel state, and then return to the user state. Switching to the running state will take a lot of time.

Therefore, the operation of applying for memory should avoid frequent system calls. If mmap is used to allocate memory, it means that system calls must be executed every time.

In addition, because the memory allocated by mmap will be returned to the operating system every time it is released, so the virtual address allocated by mmap is in a page fault state every time, and then when the virtual address is accessed for the first time, it will A page fault interrupt will be triggered.

In other words, If the memory allocated through mmap is frequently used, not only will the running state be switched every time, but a page fault interrupt will also occur (after the first access to the virtual address), which will cause CPU consumption. Larger.

In order to improve these two problems, when malloc applies for memory in the heap space through the brk() system call, since the heap space is continuous, it directly pre-allocates larger memory as a memory pool. When the memory is released, , it is cached in the memory pool.

When you apply for memory next time, just take out the corresponding memory block directly from the memory pool, and the mapping relationship between the virtual address and the physical address of this memory block may still exist. This not only reduces the system The number of calls also reduces the number of page fault interrupts, which will greatly reduce CPU consumption.

Since brk is so awesome, why not use brk for all allocation?

We mentioned earlier that the memory allocated from the heap space through brk will not be returned to the operating system, so let's consider such a scenario.

If we continuously apply for three pieces of memory, 10k, 20k, and 30k, if 10k and 20k are released, they become free memory space. If the memory applied for next time is less than 30k, then this free memory can be reused. space.

一文读懂 Linux 内存分配策略

But if the memory requested next time is greater than 30k, there is no free memory space available, and you must apply to the OS, and the actual memory used will continue to increase.

Therefore, as the system frequently mallocs and frees, especially for small blocks of memory, more and more unusable fragments will be generated in the heap, leading to "memory leaks". This "leakage" phenomenon cannot be detected using valgrind.

Therefore, in the implementation of malloc, the differences, advantages and disadvantages in the behavior of brk and mmap are fully considered, and a large block of memory (128KB) is allocated by default before mmap is used to allocate memory space.

The free() function only passes in a memory address. Why can it know how much memory to release?

Remember, I mentioned earlier that the starting address of memory returned to user mode by malloc is 16 bytes more than the starting address of the heap space of the process?

The extra 16 bytes store the description information of the memory block, such as the size of the memory block.

一文读懂 Linux 内存分配策略

In this way, when the free() function is executed, free will offset the incoming memory address to the left by 16 bytes, and then analyze the current memory block size from this 16 bytes, which will naturally Know how much memory to release.

The above is the detailed content of Understand Linux memory allocation strategy in one article. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What computer configuration is required for vscode What computer configuration is required for vscode Apr 15, 2025 pm 09:48 PM

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

vscode cannot install extension vscode cannot install extension Apr 15, 2025 pm 07:18 PM

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

Can vscode be used for mac Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

How to use VSCode How to use VSCode Apr 15, 2025 pm 11:21 PM

Visual Studio Code (VSCode) is a cross-platform, open source and free code editor developed by Microsoft. It is known for its lightweight, scalability and support for a wide range of programming languages. To install VSCode, please visit the official website to download and run the installer. When using VSCode, you can create new projects, edit code, debug code, navigate projects, expand VSCode, and manage settings. VSCode is available for Windows, macOS, and Linux, supports multiple programming languages ​​and provides various extensions through Marketplace. Its advantages include lightweight, scalability, extensive language support, rich features and version

What is the main purpose of Linux? What is the main purpose of Linux? Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

How to check the warehouse address of git How to check the warehouse address of git Apr 17, 2025 pm 01:54 PM

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

See all articles