Node.js asynchronous I/O study notes_node.js-JS Tutorial-php.cn

The term "asynchronous" became widely popular in the Web 2.0 wave, which swept the Web along with Javascript and AJAX. But in most high-level programming languages, asynchrony is rare. PHP best embodies this feature: it not only blocks asynchronous, but does not even provide multi-threading. PHP is executed in a synchronous blocking manner. This advantage helps programmers write business logic sequentially, but in complex network applications, blocking prevents better concurrency.

On the server side, I/O is very expensive, and distributed I/O is even more expensive. Only when the backend can respond to resources quickly can the front-end experience become better. Node.js is the first platform to use asynchronous as the main programming method and design concept. Along with asynchronous I/O are event-driven and single-threaded, which form the tone of Node. This article will introduce how Node implements asynchronous I/O.

1. Basic concepts

"Asynchronous" and "non-blocking" sound like the same thing. In terms of practical effects, both achieve the purpose of parallelism. But from the perspective of computer kernel I/O, there are only two methods: blocking and non-blocking. So asynchronous/synchronous and blocking/non-blocking are actually two different things.

1.1 Blocking I/O and non-blocking I/O

One characteristic of blocking I/O is that after the call, the call must wait until all operations are completed at the system kernel level before the call ends. Taking reading a file on the disk as an example, this call ends after the system kernel completes the disk seek, reads the data, and copies the data to the memory.

Blocking I/O causes the CPU to wait for I/O, wasting waiting time, and the CPU's processing power cannot be fully utilized. The characteristic of non-blocking I/O is that it returns immediately after the call. After returning, the CPU time slice can be used to process other transactions. Since the complete I/O has not been completed, what is returned immediately is not the data expected by the business layer, but only the status of the current call. In order to obtain complete data, the application needs to repeatedly call the I/O operation to confirm whether it is completed (that is, polling). Polling techniques include the following:

1.read: Checking the I/O status through repeated calls is the most primitive and lowest-performance way
2.select: Improvement of read, judging by the event status on the file descriptor. The disadvantage is that the maximum number of file descriptors is limited
3.poll: Improvement of select, using linked list to avoid maximum number limit, but when there are many descriptors, the performance is still very low
4.epoll: If no I/O event is detected when entering polling, it will sleep until an event occurs to wake it up. This is the most efficient I/O event notification mechanism currently under Linux

Polling meets the need for non-blocking I/O to ensure complete data acquisition, but for the application, it can still only be regarded as a kind of synchronization, because it still needs to wait for the I/O to return completely. During the waiting period, the CPU is either used to traverse the status of the file descriptor or to sleep and wait for an event to occur.

1.2 Asynchronous I/O in ideal and reality

Perfect asynchronous I/O should be when the application initiates a non-blocking call and can directly process the next task without polling. It only needs to pass the data to the application through a signal or callback after the I/O is completed. .

Asynchronous I/O in reality has different implementations under different operating systems. For example, the *nix platform uses a custom thread pool, and the Windows platform uses the IOCP model. Node provides libuv as an abstract encapsulation layer to encapsulate platform compatibility judgment and ensure that the implementation of asynchronous I/O of the upper Node and the lower platform is independent. In addition, it needs to be emphasized that we often mention that Node is single-threaded. This only means that Javascript is executed in a single thread. There is a thread pool that actually completes I/O tasks inside Node.

2. Asynchronous I/O of Node

2.1 Event Loop

Node’s execution model is actually an event loop. When the process starts, Node will create an infinite loop, and each execution of the loop body becomes a Tick. Each Tick process is to check whether there are events waiting to be processed, and if so, retrieve the event and its related callback functions. If there are associated callback functions, execute them, and then enter the next loop. If there are no more events to handle, exit the process.

2.2 Observer

There are several observers in each event loop, and you can determine whether there are events to be processed by asking these observers. The event loop is a typical producer/consumer model. In Node, events mainly come from network requests, file I/O, etc. These events have corresponding network I/O observers, file I/O observers, etc. The event loop takes the events from the observers and processes them.

2.3 Request object

In the transition process from Javascript initiating a call to the kernel completing an I/O operation, there is an intermediate product called a request object. Taking the simplest fs.open() method under Windows (to open a file and obtain a file descriptor according to the specified path and parameters) as an example, calling the system from JS to the built-in module through libuv actually calls uv_fs_open() method. During the calling process, an FSReqWrap request object is created. The parameters and methods passed in from the JS layer are encapsulated in this request object. The callback function we are most concerned about is set on the oncompete_sym attribute of this object. After the object is wrapped, the FSReqWrap object is pushed into the thread pool to wait for execution.

At this point, the JS call returns immediately and the JS thread can continue to perform subsequent operations. The current I/O operation is waiting to be executed in the thread pool, which completes the first phase of the asynchronous call.

2.4 Execution callback

Callback notification is the second phase of asynchronous I/O. After the I/O operation in the thread pool is called, the obtained results will be stored, and then the IOCP will be notified that the current object operation has been completed, and the thread will be returned to the thread pool. During each Tick execution, the I/O observer of the event loop will call the relevant method to check whether there is a completed request in the thread pool. If there is, the request object will be added to the queue of the I/O observer. Then treat it as an event.

3. Non-I/O asynchronous API

There are also some asynchronous APIs in Node that have nothing to do with I/O, such as timers setTimeout(), setInterval(), process.nextTick() and setImmdiate() that perform tasks asynchronously immediately, etc. Here is a brief introduction.

3.1 Timer API

The browser-side APIs of setTimeout() and setInterval() are consistent. Their implementation principles are similar to asynchronous I/O, but they do not require the participation of the I/O thread pool. The timer created by calling the timer API will be inserted into a red-black tree inside the timer observer. Each Tick of the event loop will iteratively remove the timer object from the red-black tree and check whether the timer has exceeded. If If it exceeds, an event is formed, and the callback function is executed immediately. The main problem with the timer is that its timing is not particularly precise (milliseconds, within tolerance).

3.2 Immediate asynchronous task execution API

Before Node appeared, many people might call this in order to immediately and asynchronously execute a task:

Copy code The code is as follows:

setTimeout(function() {

// TODO

}, 0);

Due to the characteristics of the event loop, the timer is not accurate enough, and using a timer requires the use of a red-black tree, and the time complexity of various operations is O(log(n)). The process.nextTick() method will only put the callback function into the queue, and take it out for execution in the next round of Tick. The complexity is O(1), which is more efficient.

There is also a setImmediate() method similar to the above method, which delays the execution of the callback function. However, the former has a higher priority than the latter, because the event loop checks observers in order. In addition, the callback functions of the former are stored in an array, and each round of Tick will execute all callback functions in the array; the results of the latter are stored in a linked list, and only one callback function will be executed per round of Tick.

4. Event-driven and high-performance server

The previous example used fs.open() to explain how Node implements asynchronous I/O. In fact, Node also uses asynchronous I/O to process network sockets, which is also the basis for Node to build web servers. Classic server models include:

1. Synchronous: Only one request can be processed at a time, and the rest of the requests are in a waiting state
2. Per process/per request: Start a process for each request, but system resources are limited and not scalable
3. Per thread/per request: Start a thread for each request. Threads are lighter than processes, but each thread occupies a certain amount of memory. When large concurrent requests arrive, the memory will be used up quickly

The famous Apache uses a per-thread/per-request format, which is why it is difficult to cope with high concurrency. Node handles requests in an event-driven manner, which can save the overhead of creating and destroying threads. At the same time, because the operating system has fewer threads when scheduling tasks, the cost of context switching is also very low. Node handles requests in an orderly manner even with a large number of connections.

The well-known server Nginx also abandoned the multi-threading method and adopted the same event-driven method as Node. Nowadays, Nginx has great potential to replace Apache. Nginx is written in pure C and has high performance, but it is only suitable for web servers, reverse proxy or load balancing, etc. Node can build the same functions as Nginx, can also handle various specific businesses, and its own performance is also good. In actual projects, we can combine their respective advantages to achieve the best performance of the application.