Let's talk about asynchronous implementation and event driving in Node-JS Tutorial-php.cn

This article will take you through the asynchronous implementation and event drive in Node, I hope it will be helpful to you! Let's talk about asynchronous implementation and event driving in Node

Characteristics of Node

Some tasks in computers can generally be divided into two categories, one category is called IO Intensive, one is called computing-intensive; for computing-intensive tasks, the performance of the CPU can only be continuously drained, but for IO-intensive tasks, ideally this is not needed, and only the IO device needs to be notified for processing. , just come back and get the data after a while. [Related tutorial recommendations: nodejs video tutorial, Programming video]

For some scenarios, there are some unrelated tasks that need to be completed. The current mainstream There are two methods:

Multi-threaded parallel completion: The cost of multi-threading is the high overhead of creating threads and executing thread context switching. In addition, in complex businesses, multi-threaded programming often faces problems such as locks and state synchronization;
Single-threaded sequential execution: easy to express, but the disadvantage of serial execution is performance, and any slightly slower task will As a result, the subsequent code was organized

node gave its solution before the two: use single thread to stay away from multi-thread deadlock, state synchronization and other problems; use asynchronous IO, keeping single threads away from blocking to better use the CPU

How Node implements asynchronous

I just talked about node's multi-tasking solution, but it is not easy to implement it internally in node. Here are some concepts of the operating system, so that everyone can better understand it in the future. Let’s talk about asynchronous implementation and node’s event loop mechanism later:

Blocking IO and non-blocking IO

Blocking IO: Application level After initiating the IO call, it keeps waiting for data. The call ends after the operating system kernel layer completes all operations;

Everything in the operating system is a file, and input and output devices are also abstracted. Files, when the kernel performs IO operations, manage them through file descriptors

Non-blocking IO: The difference is that a file descriptor is returned immediately after the call, and Wait, then the CPU time slice can be used to process other transactions, and then the results can be obtained through this file descriptor;

Some problems with non-blocking IO: Although it allows the CPU The utilization rate has increased, but since a file descriptor is returned immediately, we do not know when the IO operation is completed. In order to confirm the status change, we can only perform polling operations

Different rounds Query method

read: The most primitive and lowest performance method, completes the acquisition of complete data byrepeatedly checking the IO status
select: Judging by the event status on the file descriptor, relatively speaking, it consumes less; the disadvantage is that it uses a 1024-length array for storage status, so it can check up to 1024 file descriptors at the same time
poll: Due to the limitation of select, poll is improved to the storage of linked list The other methods are basically the same; but when there are many file descriptors, its performance is still very low
eopll: This solution is under linux The most efficient IO event notification mechanism. If no IO event is checked when entering polling, it will sleep until an event occurs to wake it up
kqueue: and epoll Similar, but only exists under FreeBSD systems

Although epoll uses events to reduce CPU consumption, the CPU is almost idle during sleep; we The expected asynchronous IO should be a non-blocking call initiated by the application. There is no need to poll through traversal or event wake-up. The next task can be processed directly. It only needs to pass the data to the application through a signal or callback after the IO is completed.

There is also an AIO method under Linux that transmits data through signals or callbacks, but it is only available in Linux, and there are restrictions that cannot use the system cache

The implementation of asynchronous IO in node

Let’s talk about the conclusion first. nodeThe implementation of asynchronous IO is implemented through multi-threading. What may be confusing is that although node is internally multi-threaded, the JavaScript code developed by our programmers only runs on a single thread. nodeUse some threads to perform blocking IO or non-blocking IO plus polling technology to complete data acquisition, let one thread perform calculation and processing, and transfer the data obtained from IO through communication between threads. , which easily realizes the simulation of asynchronous IO.

In addition to asynchronous IO, other resources in the computer are also applicable, because everything in Linux is a file, and almost all computer resources such as disks, hardware, sockets, etc. are abstracted. file, the next introduction to calls to computer resources takes IO as an example.

Event loop

When the process starts, node will create a loop similar to while(true) , each time the loop body is executed, we become Tick; Below is the event loop flow chart in node:

A very simple picture, a brief explanation: every time the execution completion event is obtained from the IO observer (it is a request object, a simple understanding is that it contains some data generated in the request), and then there is no callback function, continue to take out the next event (request object), and execute the callback function if there is a callback

Asynchronous IO details

Note: Different platforms have different implementation details. This picture hides the relevant platform compatibility details. For example, using PostQueuedCompletionStatus() in IOCP under windows to submit the execution status through GetQueuedCompletionStatus Obtain the completed request, and the details of the thread pool are implemented internally in IOCP, while platforms such as Linux implement this process through eopll, and self-implement the thread pool under libuv

`setTimtout` and `setInterval`

In addition to IO and other computer resources that require asynchronous calls, nodeThere are some other asynchronous APIs that have nothing to do with asynchronous IO:

This section first explains the first two api

Their implementation principles are similar to asynchronous IO,

just do not require the participation of the IO thread pool:

andsetIntervalThe created timer will be inserted into a red-black tree inside the timer observer Every time
is executed, it will be drawn from the red-black tree Iterate out the timer object and check whether the timer exceeds the time limitIf it does, push the event (request object) into the event queue and execute the callback function in the event loop

Red-black tree: To briefly mention here, it is a specialized balanced binary tree that can be self-balancing. The search efficiency is basically the depth of the binary tree

$O (l o g 2_{n}) O(log_2n)$ ##Have you considered this issue? Well, why does the timer not require the participation of the thread pool? If you understand the implementation principles of asynchronous IO in the previous chapters, I believe you should be able to explain it. Here is a brief explanation of the reasons to deepen your memory:
The IO thread pool in
node is a way to call IO and wait for data to return (see the specific implementation). It enables JavaScript single threads to be asynchronous Call IO, and do not need to wait for the IO execution to complete (because the IO thread pool does it), and can obtain the final data (through the observer mode: the IO observer obtains the execution completion event from the thread pool, and the event loop mechanism executes the subsequent callback function)

The above paragraph may be a bit brief. If you still don’t understand, you can look at the previous pictures~

process .nextTick and setImmediate

Both functions represent the immediate asynchronous execution of a function, so why not use setTimeout(() => { . .. }, 0) to complete it?

The timer is not accurate enough

The timer uses a red-black tree to create timer objects and iterative operations, which wastes performance

That is, process.nextTickMore lightweight

Lightweight specifically: every time we call process.nextTick, we will only put the callback function into the queue, and in the next round Take out and execute when Tick. When using the red-black tree method in the timer $O(log_2n)$ is $##O (1) O(1)$ ? After all, they all execute the callback function asynchronously immediately

process.nextTick's callback execution priority is higher than setImmediate

## The callback function of #process.nextTick is stored in an array, and is executed in each round of the event loop. The result of setImmediate is saved in a linked list, and the first callback is executed in sequence in each round of the cycle.

Note: The reason why the callback execution priority of process.nextTick is higher than that of setImmediate is because the event loop checks the observer in order.
process.nextTick
belongs to the
idle observer, setImmediate belongs to the check observer. iedlObserver> IO Observer> check ObserverHigh performance serverProcessing of network sockets ,
node
is also applied to asynchronous IO. The requests listened to on the network socket will form events and be handed over to the IO observer. The event loop will continuously process these network IO events. If we are in JavaScrpt Corresponding callback functions are passed in at the level, and these callback functions will be executed in the event loop (processing these network requests)

Common server models: Synchronous
Per process-->Per request

Per thread-->Per request
And
node
The event-driven approach is used to handle these requests. There is no need to create additional corresponding threads for each request. The overhead of creating and destroying threads can be omitted. At the same time, the operating system has fewer scheduling tasks because there are fewer threads (only
node
Some threads implemented internally) The cost of context switching is very low.

Classic problem--Avalanche problemSolution:

Problem description: When the server is just started, there is no data in the cache. If the number of visits is huge, the same SQL will be sent to the database for repeated queries, affecting performance.

Solution:
const proxy = new events.EventEmitter();
let status = "ready"; // 状态锁，避免反复查询

const select = function(callback) {
    proxy.once("selected", callback);  // 绑定一个只执行一次名为selected的事件
    if(status === "ready") {
        status = "pending";
        // sql
        db.select("SQL", (res) => {
            proxy.emit("selected", res); // 触发事件,返回查询数据
            status = "ready";
        })
    }
}
Copy after login
Use once to push all requested callbacks into the event queue, and use it to remove the monitor after executing it only once Features ensure that each callback function will only be executed once. For the same SQL statement, it is guaranteed to be executed only once from the beginning to the end of the same query. New arrivals of the same call only need to wait in the queue for the data to be ready. Once the results are queried, the results can be used by these calls.

For more programming-related knowledge, please visit: Programming Teaching! !

The above is the detailed content of Let's talk about asynchronous implementation and event driving in Node. For more information, please follow other related articles on the PHP Chinese website!