The difference between poll and select in Linux is: the maximum number of connections that can be opened by a single select process is defined by the "FD_SETSIZE" macro, and its size is 32 integers, while poll uses linked list storage, so there is no Limit on the maximum number of connections.
#The operating environment of this tutorial: linux7.3 system, Dell G3 computer.
The select used by each process has a limit on the maximum number of connections, which can only be FD_SETSIZE, while poll has no such limit (using linked list storage);
Both epoll and select can provide multi-channel I/O multiplexing solutions. All of them can be supported in the current Linux kernel, among which epoll is unique to Linux, while select should be stipulated by POSIX and implemented in general operating systems.
select:
select essentially performs the next step of processing by setting or checking the data structure storing the fd flag. The disadvantages of this are:
1. The number of fds that can be monitored by a single process is limited, that is, the size of the listening port is limited.
Generally speaking, this number is closely related to the system memory. The specific number can be viewed by cat /proc/sys/fs/file-max. The default for 32-bit machines is 1024. The default for 64-bit machines is 2048.
2. When scanning the socket, it is a linear scan, that is, the polling method is used, which is less efficient:
When there are many sockets, Each select() must complete the scheduling by traversing FD_SETSIZE Sockets. No matter which Socket is active, it will be traversed once. This wastes a lot of CPU time. If you can register a callback function for the socket and automatically complete the relevant operations when they are active, you can avoid polling. This is what epoll and kqueue do.
3. It is necessary to maintain a data structure used to store a large number of FDs, which will cause high copy overhead when transferring the structure between user space and kernel space.
poll:
Poll is essentially the same as select. It copies the array passed in by the user to the kernel space, and then queries the device status corresponding to each fd. If the device is ready, it adds an item to the device waiting queue and continues. Traversal, if no ready device is found after traversing all fd, the current process will be suspended until the device is ready or the initiative times out. After being awakened, it will traverse the fd again. This process went through many unnecessary traversals.
It has no limit on the maximum number of connections because it is stored based on a linked list, but it also has a shortcoming:
1. A large number of fd arrays are copied as a whole in user mode and between kernel address spaces, regardless of whether such copying makes sense.
2. Another feature of poll is "horizontal triggering". If an fd is reported but is not processed, the fd will be reported again the next time it is polled.
epoll:
epoll has two trigger modes: EPOLLLT and EPOLLET. LT is the default mode and ET is the "high-speed" mode. In LT mode, as long as the fd still has data to read, epoll_wait will return its event each time to remind the user program to operate. In ET (edge trigger) mode, it will only prompt once until there is data next time. There will be no further prompts before inflow, regardless of whether there is still readable data in fd. Therefore, in ET mode, when reading an fd, its buffer must be read out, that is to say, until the return value of read is less than the requested value, or an EAGAIN error is encountered. Another feature is that epoll uses the "event" readiness notification method to register the fd through epoll_ctl. Once the fd is ready, the kernel will use a callback-like callback mechanism to activate the fd, and epoll_wait can receive the notification.
Why does epoll have EPOLLET trigger mode?
If the EPOLLLT mode is adopted, once there are a large number of ready file descriptors in the system that you do not need to read or write, they will return every time epoll_wait is called, which will greatly reduce the processing program's ability to retrieve the ready files it cares about. File descriptor efficiency... If the edge trigger mode of EPOLLET is used, when a readable and writable event occurs on the monitored file descriptor, epoll_wait() will notify the handler to read and write. If all the data is not read and written this time (for example, the read and write buffer is too small), it will not notify you the next time you call epoll_wait(), that is, it will only notify you once, until the file descriptor is You will not be notified until the second readable and writable event occurs! ! ! This mode is more efficient than horizontal triggering. The system will not be flooded with a large number of ready file descriptors that you don’t care about.
Advantages of epoll:
1. No maximum concurrent connections limit, the upper limit of FDs that can be opened is much greater than 1024 (1G of memory can monitor about 100,000 ports);
2. Efficiency improvement, it is not a polling method, and it will not increase with the number of FDs Increased efficiency decreases. Only active and available FDs will call the callback function;
The biggest advantage of Epoll is that it only cares about your "active" connections and has nothing to do with the total number of connections. Therefore, in the actual network environment, Epoll's efficiency It will be much higher than select and poll.
3. Memory copy, use mmap() file mapping memory to accelerate message passing with the kernel space; that is, epoll uses mmap to reduce copy overhead.
Summary of differences between select, poll, and epoll:
1. Supports the maximum number of connections that can be opened by a process
select
The maximum number of connections that can be opened by a single process is defined by the FD_SETSIZE macro, and its size is the size of 32 integers (on a 32-bit machine, the size is 3232, and similarly on a 64-bit machine FD_SETSIZE is 3264), of course we can modify it and then recompile the kernel, but the performance may be affected, which requires further testing.
poll
Poll is essentially the same as select, but it has no limit on the maximum number of connections because it is stored based on a linked list
epoll
Although there is an upper limit on the number of connections, it is very large. A machine with 1G memory can open about 100,000 connections, and a machine with 2G memory can open about 200,000 connections.
2. FD increases sharply The IO efficiency problem caused by
select
Because the connection will be linearly traversed every time it is called, as the FD increases, it will cause a slow "linear decline" in the traversal speed. performance issues".
poll
Same as above
epoll
Because the implementation in the epoll kernel is based on the callback function on each fd, only active sockets The callback will be actively called, so when there are few active sockets, using epoll does not have the linear decline performance problem of the previous two. However, when all sockets are active, there may be performance problems.
3. Message passing method
select
The kernel needs to pass messages to user space, which requires kernel copy action
poll
Same as above
epoll
epoll is implemented by sharing a memory between the kernel and user space.
Summary:
In summary, when choosing select, poll, or epoll, you should consider the specific use occasions and the characteristics of these three methods.
1. On the surface, epoll has the best performance, but when the number of connections is small and the connections are very active, the performance of select and poll may be better than epoll. After all, epoll’s notification mechanism requires many function callbacks. .
2. Select is inefficient because it needs to be polled every time. But inefficiency is also relative, depending on the situation, it can also be improved through good design
Recommended learning:Linux video tutorial
The above is the detailed content of What is the difference between poll and select in linux?. For more information, please follow other related articles on the PHP Chinese website!