Home Operation and Maintenance Linux Operation and Maintenance Let's talk about several zero-copy technologies and applicable scenarios in Linux

Let's talk about several zero-copy technologies and applicable scenarios in Linux

Jul 27, 2020 pm 05:40 PM
linux

Let's talk about several zero-copy technologies and applicable scenarios in Linux

This article discusses the main zero-copy technologies and the scenarios where zero-copy technologyis applicable in Linux. In order to quickly establish the concept of zero copy, we introduce a commonly used scenario:

citation

##Writing a server When running a program (Web Server or file server), file downloading is a basic function. At this time, the task of the server is: Send the files in the server host disk from the connected socket without modification , we usually use the following code to complete:

while((n = read(diskfd, buf, BUF_SIZE)) > 0)
    write(sockfd, buf , n);
Copy after login

Basic operation It is to read the file content from the disk to the buffer in a loop, and then send the buffer content to socket. But because Linux's I/O operations default to buffered I/O. The two main system calls used here are read and write. We don’t know what the operating system does in them. In fact, in the above I/O operations, multiple data copies occurred.

When an application accesses a certain piece of data, the operating system will first check whether the file has been accessed recently and whether the file content is cached in the kernel buffer. If so, the operating system will directly read according to read The buf address provided by the system call copies the contents of the kernel buffer to the user space buffer specified by buf. If not, the operating system first copies the data on the disk to the kernel buffer. This step currently mainly relies on DMA for transmission, and then copies the contents of the kernel buffer to the user buffer.
Next, the write system call copies the contents of the user buffer to the kernel buffer related to the network stack, and finally socket sends the contents of the kernel buffer to on the network card.
Having said so much, it’s better to look at the picture to make it clearer:

Let's talk about several zero-copy technologies and applicable scenarios in LinuxLets talk about several zero-copy technologies and applicable scenarios in Linux


As can be seen from the above picture, a total of four data copies are generated. Even if DMA is used to handle communication with the hardware, the CPU still needs to process two data copies. At the same time, multiple context switches occur in user mode and kernel mode, which undoubtedly aggravates the CPU burden.

During this process, we did not make any modifications to the file content, so copying data back and forth between kernel space and user space is undoubtedly a waste, and zero copy is mainly to solve this inefficiency.

What is zero-copy technology?

##The main task of zero copy is toavoidCPU copying data from one storage to another. The main purpose is to use various zero-copy technologies to avoid The CPU does a large number of data copy tasks to reduce unnecessary copies, or let other components do this type of simple data transfer tasks, freeing the CPU to focus on other tasks. This allows for more efficient use of system resources.

Let’s go back to the example in the quotation. How can we reduce the number of data copies? An obvious focus is to reduce the copying of data back and forth between kernel space and user space. This also introduces a type of zero copy:

so that data transmission does not need to go through user space

Using mmap

One way we can reduce the number of copies is to call mmap() instead read call: <div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false">buf = mmap(diskfd, len); write(sockfd, buf, len);</pre><div class="contentsignin">Copy after login</div></div>The application calls mmap(), the data on the disk will be copied to the kernel buffer through DMA, and then the operating system will The kernel buffer is shared with the application, so there is no need to copy the contents of the kernel buffer to user space. The application then calls write(), and the operating system directly copies the contents of the kernel buffer to the socket buffer. All this occurs in the kernel state. Finally,
socket

The buffer then sends the data to the network card.

Similarly, looking at the picture is very simple: Lets talk about several zero-copy technologies and applicable scenarios in Linux

Using mmap instead of read obviously reduces one copy. When the amount of copied data is large, it undoubtedly improves efficiency. But using mmap comes at a cost. When you use mmap, you may encounter some hidden traps. For example, when your program map maps a file, but when the file is truncated (truncate) by another process, the write system call will be terminated by the SIGBUS signal because it accesses an illegal address. . The SIGBUS signal will kill your process by default and generate a

coredump###. If your server is stopped in this way, it will cause a loss. ###

通常我们使用以下解决方案避免这种问题:

1、为SIGBUS信号建立信号处理程序
当遇到SIGBUS信号时,信号处理程序简单地返回,write系统调用在被中断之前会返回已经写入的字节数,并且errno会被设置成success,但是这是一种糟糕的处理办法,因为你并没有解决问题的实质核心。

2、使用文件租借锁
通常我们使用这种方法,在文件描述符上使用租借锁,我们为文件向内核申请一个租借锁,当其它进程想要截断这个文件时,内核会向我们发送一个实时的RT_SIGNAL_LEASE信号,告诉我们内核正在破坏你加持在文件上的读写锁。这样在程序访问非法内存并且被SIGBUS杀死之前,你的write系统调用会被中断。write会返回已经写入的字节数,并且置errno为success。

我们应该在mmap文件之前加锁,并且在操作完文件后解锁:

if(fcntl(diskfd, F_SETSIG, RT_SIGNAL_LEASE) == -1) {
    perror("kernel lease set signal");
    return -1;
}
/* l_type can be F_RDLCK F_WRLCK  加锁*/
/* l_type can be  F_UNLCK 解锁*/
if(fcntl(diskfd, F_SETLEASE, l_type)){
    perror("kernel lease set type");
    return -1;
}
Copy after login

使用sendfile#####

从2.1版内核开始,Linux引入了sendfile来简化操作:

#include<sys>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);</sys>
Copy after login

系统调用sendfile()在代表输入文件的描述符in_fd和代表输出文件的描述符out_fd之间传送文件内容(字节)。描述符out_fd必须指向一个套接字,而in_fd指向的文件必须是可以mmap的。这些局限限制了sendfile的使用,使sendfile只能将数据从文件传递到套接字上,反之则不行。
使用sendfile不仅减少了数据拷贝的次数,还减少了上下文切换,数据传送始终只发生在kernel space

Lets talk about several zero-copy technologies and applicable scenarios in Linux

在我们调用sendfile时,如果有其它进程截断了文件会发生什么呢?假设我们没有设置任何信号处理程序,sendfile调用仅仅返回它在被中断之前已经传输的字节数,errno会被置为success。如果我们在调用sendfile之前给文件加了锁,sendfile的行为仍然和之前相同,我们还会收到RT_SIGNAL_LEASE的信号。

目前为止,我们已经减少了数据拷贝的次数了,但是仍然存在一次拷贝,就是页缓存到socket缓存的拷贝。那么能不能把这个拷贝也省略呢?

借助于硬件上的帮助,我们是可以办到的。之前我们是把页缓存的数据拷贝到socket缓存中,实际上,我们仅仅需要把缓冲区描述符传到socket缓冲区,再把数据长度传过去,这样DMA控制器直接将页缓存中的数据打包发送到网络中就可以了。

总结一下,sendfile系统调用利用DMA引擎将文件内容拷贝到内核缓冲区去,然后将带有文件位置和长度信息的缓冲区描述符添加socket缓冲区去,这一步不会将内核中的数据拷贝到socket缓冲区中,DMA引擎会将内核缓冲区的数据拷贝到协议引擎中去,避免了最后一次拷贝。

Lets talk about several zero-copy technologies and applicable scenarios in Linux

不过这一种收集拷贝功能是需要硬件以及驱动程序支持的。

使用splice#####

sendfile只适用于将数据从文件拷贝到套接字上,限定了它的使用范围。Linux在2.6.17版本引入splice系统调用,用于在两个文件描述符中移动数据:

#define _GNU_SOURCE         /* See feature_test_macros(7) */
#include <fcntl.h>
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);</fcntl.h>
Copy after login

splice调用在两个文件描述符之间移动数据,而不需要数据在内核空间和用户空间来回拷贝。他从fd_in拷贝len长度的数据到fd_out,但是有一方必须是管道设备,这也是目前splice的一些局限性。flags参数有以下几种取值:

  • SPLICE_F_MOVE: Try to move data instead of copying data. This is just a small reminder to the kernel: if the kernel cannot move data from pipe or pipe's cache is not a full page, it still needs to copy the data. There were some problems with the initial implementation of Linux, so this option will not work from 2.6.21, and it should be implemented in later Linux versions.
  • ** SPLICE_F_NONBLOCK** : splice The operation will not be blocked. However, if the file descriptor is not set up for non-blocking I/O, then calling splice may still block.
  • ** SPLICE_F_MORE**: Subsequent splice calls will have more data.

The splice call utilizes the pipe buffer mechanism proposed by Linux, so at least one descriptor must be a pipe.

The above zero-copy technologies are all implemented by reducing the copying of data between user space and kernel space. However, sometimes, data must be copied between user space and kernel space. At this time, we can only work on the timing of data copying in user space and kernel space. Linux usually uses copy on write(copy on write) to reduce system overhead. This technology is often called COW.

Due to space reasons, this article does not introduce copy-on-write in detail. A rough description is: If multiple programs access the same piece of data at the same time, then each program has a pointer to this piece of data. From the perspective of each program, it owns this piece of data independently. Only when the program needs to When the data content is modified, the data content will be copied to the program's own application space. Only then will the data become the program's private data. If the program does not need to modify the data, then it never needs to copy the data to its own application space. This reduces data copying. The content copied while writing can be used to write another article. . .

In addition, there are some zero-copy technologies. For example, adding the O_DIRECT mark to traditional Linux I/O can directly I/O, avoiding the need for Automatic caching, as well as the immature fbufs technology. This article has not yet covered all zero-copy technologies. It only introduces some common ones. If you are interested, you can study it yourself. Generally, mature server projects will also modify their own kernels. The I/O part of the system improves its data transfer rate.

Recommended tutorial: "Linux Operation and Maintenance"

The above is the detailed content of Let's talk about several zero-copy technologies and applicable scenarios in Linux. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

deepseek web version entrance deepseek official website entrance deepseek web version entrance deepseek official website entrance Feb 19, 2025 pm 04:54 PM

DeepSeek is a powerful intelligent search and analysis tool that provides two access methods: web version and official website. The web version is convenient and efficient, and can be used without installation; the official website provides comprehensive product information, download resources and support services. Whether individuals or corporate users, they can easily obtain and analyze massive data through DeepSeek to improve work efficiency, assist decision-making and promote innovation.

How to install deepseek How to install deepseek Feb 19, 2025 pm 05:48 PM

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

BITGet official website installation (2025 beginner's guide) BITGet official website installation (2025 beginner's guide) Feb 21, 2025 pm 08:42 PM

BITGet is a cryptocurrency exchange that provides a variety of trading services including spot trading, contract trading and derivatives. Founded in 2018, the exchange is headquartered in Singapore and is committed to providing users with a safe and reliable trading platform. BITGet offers a variety of trading pairs, including BTC/USDT, ETH/USDT and XRP/USDT. Additionally, the exchange has a reputation for security and liquidity and offers a variety of features such as premium order types, leveraged trading and 24/7 customer support.

Get the gate.io installation package for free Get the gate.io installation package for free Feb 21, 2025 pm 08:21 PM

Gate.io is a popular cryptocurrency exchange that users can use by downloading its installation package and installing it on their devices. The steps to obtain the installation package are as follows: Visit the official website of Gate.io, click "Download", select the corresponding operating system (Windows, Mac or Linux), and download the installation package to your computer. It is recommended to temporarily disable antivirus software or firewall during installation to ensure smooth installation. After completion, the user needs to create a Gate.io account to start using it.

Ouyi okx installation package is directly included Ouyi okx installation package is directly included Feb 21, 2025 pm 08:00 PM

Ouyi OKX, the world's leading digital asset exchange, has now launched an official installation package to provide a safe and convenient trading experience. The OKX installation package of Ouyi does not need to be accessed through a browser. It can directly install independent applications on the device, creating a stable and efficient trading platform for users. The installation process is simple and easy to understand. Users only need to download the latest version of the installation package and follow the prompts to complete the installation step by step.

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Ouyi Exchange Download Official Portal Ouyi Exchange Download Official Portal Feb 21, 2025 pm 07:51 PM

Ouyi, also known as OKX, is a world-leading cryptocurrency trading platform. The article provides a download portal for Ouyi's official installation package, which facilitates users to install Ouyi client on different devices. This installation package supports Windows, Mac, Android and iOS systems. Users can choose the corresponding version to download according to their device type. After the installation is completed, users can register or log in to the Ouyi account, start trading cryptocurrencies and enjoy other services provided by the platform.

See all articles