Detailed graphic and text explanation of IO reuse in Java-javaTutorial-php.cn

Home

Java

javaTutorial

Detailed graphic and text explanation of IO reuse in Java

黄舟

May 28, 2017 am 09:23 AM

This article mainly introduces the relevant knowledge of Java IO reuse. It is very good and has reference value. Friends who need it can refer to it

For the concurrency processing ability of the server , what we need is: every millisecond the server can promptly process hundreds of messages on different TCP connections received within this millisecond. At the same time, there may be hundreds of thousands of messages on the server that have not been processed in the last few seconds. A relatively inactive connection for sending and receiving any messages. Processing multiple connections that occur in parallel at the same time is called concurrency; processing tens of thousands or hundreds of thousands of connections at the same time is high concurrency. What the server's concurrencyProgramming pursues is to process an infinite number of concurrent connections while maintaining efficient use of resources such as the CPU until the physical resources are first exhausted. There are many implementations of concurrent programming

Model

, the simplest one is bundled with "threads", and one thread handles the entire life cycle of one connection. Advantages: This model is simple enough, it can implement complex business scenarios, and at the same time, the number of threads can be much greater than the number of CPUs. However, the number of threads cannot be increased infinitely. Why? Because when a thread is executed is determined by the operating system kernel scheduling algorithm, the scheduling algorithm does not consider that a certain thread may only serve one connection. It will adopt a unified gameplay: execute it when the time slice is up, even if this thread Once executed, you will have to continue sleeping. This back and forth of waking up and sleeping threads is cheap when the number of times is small, but if the total number of threads in the operating system is large, it is expensive (amplified), because this technical scheduling loss will affect the threads The time on which the business code is executed. For example, most of the threads with inactive connections at this time are like our state-owned enterprises. Their execution efficiency is too low. They always wake up and sleep to do useless work. When they wake up and compete for CPU resources, it means The number of private enterprise threads processing active connections is reduced and the opportunity to obtain the CPU is reduced. The CPU is the core competitiveness, and its inefficiency affects the total GDP throughput. What we are pursuing is to process hundreds of thousands of connections concurrently. When thousands of threads appear, the system's execution efficiency can no longer meet high concurrency. For high-concurrency programming, there is currently only one model, which is also the only essentially effective method. Message processing on the connection can be divided into two stages: waiting for the message to be ready and message processing. When using the default blocking socket (for example, one thread bundled to process one connection mentioned above), these two stages are often combined into one, so that the thread that operates the socket code must sleep. To wait for the message to be ready, this causes the thread to sleep and wake up frequently under high concurrency, thus affecting the CPU usage efficiency.

The high-concurrency programming method is, of course, to separate the two stages. That is, the code section that waits for the message to be ready is separated from the code section that processes the message. Of course, this also requires that the socket must be non-blocking. Otherwise, the code segment that processes the message can easily cause the thread to enter the sleep waiting stage when the conditions are not met. So the question is, how to achieve this stage of waiting for the message to be ready? After all, it is still waiting, which means that the thread still has to sleep! The solution is to actively

query

, or let 1 thread wait for all connections! This is IO multiplexing. Multiplexing is all about waiting for messages to be ready, but it can handle multiple connections at the same time! It can also "wait", so it may also cause the thread to sleep, however this does not matter because it is one-to-many and it can monitor all connections. In this way, when our thread is awakened for execution, there must be some connections ready to be executed by our code, which is efficient! There are not so many threads competing to process the "waiting for message to be ready" phase, and the whole world is finally clear! There are many implementations of multiplexing. On linux
, before the 2.4 kernel, the main ones were select and poll. Now the mainstream is epoll. Their usage methods seem to be very different, but the essence is the same. The efficiency is also different, which is why epoll completely replaced select.

Let’s briefly talk about why epoll replaces select.

As mentioned earlier, the core solution for high concurrency is to have one thread handle "waiting for messages to be ready" for all connections. There is no dispute between epoll and select on this point. But select estimated one thing wrong. As we said in

Opening

, when hundreds of thousands of concurrent connections exist, there may be only hundreds of active connections every millisecond, while the remaining hundreds of thousands of connections is inactive during this millisecond. The method of using select is as follows: Returned active connections ==select (all connections to be monitored)

When will the select method be called? You should call this when you think you need to find out which active connections have received packets. Therefore, calling select will be called frequently when concurrency is high. In this way, it is necessary to see whether this frequently called method is efficient, because its slight efficiency loss will be amplified by the word "frequent". Does it have an efficiency loss? Obviously, there are hundreds of thousands of connections to be monitored, and only hundreds of active connections are returned, which is inefficient in itself. After being amplified, you will find that select is completely unable to handle tens of thousands of concurrent connections.
Look at some pictures. When the number of concurrent connections is less than 1,000, the number of select executions is not frequent, and there does not seem to be much difference with epoll:

However, once the number of concurrent connections increases, The shortcomings of select are infinitely magnified by "frequent execution", and the more concurrent the number, the more obvious it is:

Let's talk about how epoll solves it. It very cleverly uses 3 methods to achieve what the select method does:

New epoll descriptor==epoll_create()

epoll_ctrl(epoll descriptor, add OrDeleteall connections to be monitored)

The returned active connection==epoll_wait(epoll descriptor)

The main benefit of doing this is: distinguish between frequent calls and Operations that are called infrequently. For example, epoll_ctrl is called less frequently, while epoll_wait is called very frequently. At this time, epoll_wait has almost no input parameters, which is much more efficient than select. Moreover, it will not increase the number of input parameters as concurrent connections increase, resulting in a decrease in kernel execution efficiency.

How is epoll implemented? In fact, it is very simple. It can be seen from these three methods that it is smarter than select in avoiding the need to pass in all the connections to be monitored every time when epoll_wait frequently calls "which connections are already in the message preparation stage". of. This means that it maintains a data structure in the kernel mode to save all connections to be monitored. This data structure is a red-black tree, and the addition and reduction of its nodes is completed through epoll_ctrl. It is very simple:

The red-black tree in the lower left corner of the picture consists of all the connections to be monitored. The linked list on the upper left shows all currently active connections. Therefore, when epoll_wait is executed, it only checks the upper left linked list and returns the connection in the upper left linked list to the user. In this way, can the execution efficiency of epoll_wait be low?

Finally, let’s take a look at the two gameplay methods ET and LT provided by epoll, which are the translated edge trigger and horizontal trigger. In fact, these two Chinese names are somewhat appropriate. These two usage methods are still aimed at efficiency issues, but they just become how to make the connection returned by epoll_wait more accurate.

For example, we need to monitor whether the write buffer of a connection is free. When it is "writable", we can send the response call write to the client from the user mode. However, perhaps when the connection is writable, our "response" content is still on the disk. What if the disk read has not been completed at this time? The thread must not be blocked, so the response will not be sent. However, the connection may be returned to you the next time you epoll_wait, and you have to check whether you want to process it. Probably, our program has another module that handles disk IO specifically, and it will send a response when the disk IO is completed. So, every time epoll_wait returns this "writable" connection that cannot be processed immediately, does it meet user expectations?

So, the ET and LT models came into being. LT is that every connection that meets the expected state must be returned in epoll_wait, so it treats everyone equally and is on a horizontal line. This is not the case with ET, which prefers more precise return connections. In the above example, after the connection becomes writable for the first time, if the program does not write any data to the connection, then epoll_wait will not return the connection next time. ET is called edge trigger, which means that epoll_wait will be triggered to return it only when the connection changes from one state to another. It can be seen that the programming of ET is much more complicated. At least the application must be careful to prevent the connection returned by epoll_wait from appearing: when it is writable, the data is not written but it expects the next "writable"; when it is readable, the data is not read but it expects the next time. Once "readable".

Of course, there won’t be any big difference in performance in general application scenarios. The possible advantage of ET is that the number of calls to epoll_wait will be reduced, and in some scenarios, the connection will not be awakened when it is not necessary. (This wake-up refers to the return of epoll_wait). But if it’s like the example I mentioned above, sometimes it’s not just a network problem, it’s related to the application scenario. Of course, most open source frameworks are written based on ET. As for the framework, it pursues purely technical issues, and of course strives for perfection

The above is the detailed content of Detailed graphic and text explanation of IO reuse in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

InZoi: How To Apply To School And University

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7755

Java Tutorial

1643

CakePHP Tutorial

1399

Laravel Tutorial

1293

PHP Tutorial

1234

Related knowledge

Java Spring Interview Questions Aug 30, 2024 pm 04:29 PM

In this article, we have kept the most asked Java Spring Interview Questions with their detailed answers. So that you can crack the interview.

Break or return from Java 8 stream forEach? Feb 07, 2025 pm 12:09 PM

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is

PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

TimeStamp to Date in Java Aug 30, 2024 pm 04:28 PM

Guide to TimeStamp to Date in Java. Here we also discuss the introduction and how to convert timestamp to date in java along with examples.

PHP vs. Python: Understanding the Differences Apr 11, 2025 am 12:15 AM

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

Java Program to Find the Volume of Capsule Feb 07, 2025 am 11:37 AM

Capsules are three-dimensional geometric figures, composed of a cylinder and a hemisphere at both ends. The volume of the capsule can be calculated by adding the volume of the cylinder and the volume of the hemisphere at both ends. This tutorial will discuss how to calculate the volume of a given capsule in Java using different methods. Capsule volume formula The formula for capsule volume is as follows: Capsule volume = Cylindrical volume Volume Two hemisphere volume in, r: The radius of the hemisphere. h: The height of the cylinder (excluding the hemisphere). Example 1 enter Radius = 5 units Height = 10 units Output Volume = 1570.8 cubic units explain Calculate volume using formula: Volume = π × r2 × h (4

PHP vs. Other Languages: A Comparison Apr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP vs. Python: Core Features and Functionality Apr 13, 2025 am 12:16 AM

PHP and Python each have their own advantages and are suitable for different scenarios. 1.PHP is suitable for web development and provides built-in web servers and rich function libraries. 2. Python is suitable for data science and machine learning, with concise syntax and a powerful standard library. When choosing, it should be decided based on project requirements.

See all articles