Explore the theory behind the event processing loop in Node.js

Question

I'm looking at this gist describing file walking algorithms in JavaScript //ES6 version using asynchronous iterators, compatible with nodev10.0+constfs=require("fs"); constpath=require("path"); asyncfunction*walk(dir){forawait(constd

P粉328911308 · Answer

Well, at a high level, the first part of that sentence and the second part of that sentence conflict. If the server is inefficient, then it will not be able to properly respond to a bunch of client requests arriving at the same time. Therefore, you want to make your server more efficient so that it can be as responsive to client requests as possible.

Now, we need to go back a few steps and understand what exactly is going on in the code you've shown. First, while the language happens to be Javascript, the core logic of this code and how it uses async, await, and generators is not just because of the language Javascript. This is because of the specific environment in which Javascript runs, in this case nodejs.

This environment uses an event loop and runs a single Javascript thread. Other operating system threads are used for various system things and some library implementations, but when nodejs runs your Javascript, it only runs one piece of Javascript at a time (single thread).

Also, when you design a server, you want it to be able to respond to a large number of incoming requests. You don't want it to have to process one request and have all other requests wait until the first request completes before starting the next one. However, the nodejs event loop model does not use multiple threads and therefore does not directly run multiple request handlers at the same time.

The solution deployed by nodejs comes from the fact that for the various server request handlers, the main activity and what spends most of the time handling requests is I/O (such as network, file I/O or Database)/O). These are lower-level operations that have their own (non-Javascript) implementation.

Therefore, it deploys an asynchronous model for all I/O operations. A well-written server can initiate an asynchronous I/O operation, and while it's being processed (not in code running in the Nodejs interpreter itself), the interpreter and the NodeJS event loop are free to do other things and handle another request. Some time later, when that asynchronous operation completes, an event is inserted into the event loop, and when the interpreter completes whatever operation it was performing, it can process the results of the asynchronous operation and continue the operation.

This way, Javascript is only executed in a single thread, but many incoming requests can be "processed" at the same time.

Yes, this is a completely different model from the old C/C threading model. You either learn this different model for writing efficient and effective server code in Nodejs, or you don't. If you want to stick with the old model, then choose a different environment that runs the request handler in a thread (Java, C, etc.) and aim to do that well (with the associated design and testing overhead of course ) written correctly and thoroughly tested for all multi-threaded concurrency).

One of the great benefits of the nodejs model is that it is less susceptible to many of the concurrency issues present with multi-threaded execution models. Nodejs models also have some shortcomings that occasionally require workarounds. For example, if you have CPU-intensive code in a request handler written in Javascript, this will still bog things down and you'll need to find a way to move the CPU-intensive code out of the main event loop and onto a different thread , or process (maybe even a work queue). However, the I/O is all asynchronous and can remain on the main thread without causing any problems.

The less code that needs to be in separate concurrent threads, the fewer concurrency bugs you are likely to encounter, and the code is easier to fully test.

Well, you want a loop because you're trying to loop something. You want to use async operations in the event loop when you don't want to block the event loop or when this is the only type of operation you have to complete a task (such as doing a lookup in a database).

Using asynchronous operations is not to optimize something. This is about the core design of writing good server code and not blocking the event loop. And, in fact, possible interfaces in nodejs (such as database interfaces or network interfaces) only provide asynchronous interfaces.

The way you asked this question suggests that you would benefit from better understanding the core Nodejs architecture and reading more about how the event loop works and how asynchronous I/O operations work.

First of all, if you are using an asynchronous API (such as a network or database), then you have no choice. You will design asynchronous code to use this API. If you have a choice between using an asynchronous or a synchronous API (just like you do with file system access in Node.js), then you can choose whether to block the event loop on every API call or not. If you block the event loop, you will seriously harm the scalability and responsiveness of your server.

This particular code example does attempt to use the kitchen sink of async language features in the same implementation as async, await, generators, and yield. I generally don't do this. The whole point of this implementation is to be able to create an interface that can be used very simply like this:

for await (const p of walk('/tmp/')) {
    ...
}

And walk() is asynchronous internally. This implementation hides almost all of the asynchronous implementation's complexity from API users, which makes the API easier to code. By putting a single await in the right place, users of the API can code almost as if they were synchronous. The purpose of these Javascript language features (promises, async, await, generators, etc.) is to make asynchronous operations easier to code.

Advantages of the event loop model

Programming is simple. You usually don't have to deal with concurrency issues when accessing shared data from threads, because all Javascript runs in the same thread, so all shared data accesses come from the same thread. You don't need a mutex to access shared data. You don't face any risk of deadlock with these mutexes.

Fewer errors. Accessing public data from threads is much more difficult to write error-free code. If not written perfectly, the code may be subject to race conditions or lack concurrency protection. Moreover, these race conditions are often difficult to test and may not become apparent until your server is under heavy load, and even then they are not easy to reproduce.

Higher scalability (in some cases). For code that is primarily I/O bound, a cooperative event loop model may lead to greater scalability. This is because each request in processing does not result in a separate operating system thread and its added overhead. Instead, there is only a small amount of application state, usually in closures related to waiting for the next callback or promise.

Articles about event loop programming

Why the cool kids use event loops - This is exactly about using event loops in Java programming, but the discussion applies to any environment

Case of Threads and Events