Introduction to some methods to give full play to the performance of Node.js programs

A Node.JS process will only run on a single physical core. Because of this, special attention needs to be paid when developing a scalable server.

Because there is a stable set of APIs and the development of native extensions to manage processes, there are many different ways to design a Node.JS application that can be parallelized. In this blog post, we compare these possible architectures.

This article also introduces the compute-cluster module: a small Node.JS library that can be used to easily manage processes and implement second-line distributed computing.

Problems encountered

In our Mozilla Persona project we need to be able to handle a large number of requests with different characteristics, so we tried to use Node.JS.

In order not to affect the user experience, the ‘Interactive’ request we designed only requires lightweight computing consumption, but provides faster response time so that the UI does not feel stuck. In comparison, a 'Batch' operation takes about half a second to process, and there may be longer delays due to other reasons.

For better design, we have found many solutions that meet our current needs.
Considering scalability and cost, we list the following key requirements:

Efficiency: Can effectively use all idle processors
Response: Our “application” can respond quickly in real time
Elegance: When there are too many requests to handle, we handle what we can handle. If you can’t handle it, please report the error clearly
Simplicity: Our solutions must be simple and convenient to use

Through the above points we can filter clearly and with purpose

Option 1: Process directly in the main thread.

When the main thread processes data directly, the result is very bad:

You cannot take full advantage of multi-core CPUs. In interactive request/response, you must wait for the current request (or response) to be processed, which is inelegant.

The only advantage of this solution is that it is simple enough

function myRequestHandler(request, response) [
 // Let's bring everything to a grinding halt for half a second.
 var results = doComputationWorkSync(request.somesuch);
}

Copy after login

In a Node.JS program, if you want to handle multiple requests at the same time and want to process them synchronously, then you are going to be in trouble.

Method 2: Whether to use asynchronous processing.

Will there be a big performance improvement if asynchronous methods are used in the background?

The answer is not necessarily. It depends on whether running in the background makes sense

For example, in the following situation: If the performance is not better than synchronous processing when using JavaScript or local code on the main thread to perform calculations, you do not necessarily need to use asynchronous methods in the background to process

Please read the following code

function doComputationWork(input, callback) {
 // Because the internal implementation of this asynchronous
 // function is itself synchronously run on the main thread,
 // you still starve the entire process.
 var output = doComputationWorkSync(input);
 process.nextTick(function() {
  callback(null, output);
 });
}
 
function myRequestHandler(request, response) [
 // Even though this *looks* better, we're still bringing everything
 // to a grinding halt.
 doComputationWork(request.somesuch, function(err, results) {
  // ... do something with results ...
 });

Copy after login

}
关键点就在于NodeJS异步API的使用并不依赖于多进程的应用

方案三：用线程库来实现异步处理。

只要实现得当，使用本地代码实现的库，在 NodeJS 调用的时候是可以突破限制从而实现多线程功能的。

有很多这样的例子， Nick Campbell 编写的 bcrypt library 就是其中优秀的一个。

如果你在4核机器上拿这个库来作一个测试，你将看到神奇的一幕：4倍于平时的吞吐量，并且耗尽了几乎所有的资源！但是如果你在24核机器上测试，结果将不会有太大变化：有4个核心的使用率基本达到100%，但其他的核心基本上都处于空闲状态。

问题出在这个库使用了NodeJS内部的线程池，而这个线程池并不适合用来进行此类的计算。另外，这个线程池上限写死了，最多只能运行4个线程。

除了写死了上限，这个问题更深层的原因是：

使用NodeJS内部线程池进行大量运算的话，会妨碍其文件或网络操作，使程序看起来响应缓慢。
很难找到合适的方法来处理等待队列：试想一下，如果你队列里面已经积压了5分钟计算量的线程，你还希望继续往里面添加线程吗?

内建线程机制的组件库在这种情况下并不能有效地利用多核的优势，这降低了程序的响应能力，并且随着负载的加大，程序表现越来越差。

方案四：使用 NodeJS 的 cluster 模块

NodeJS 0.6.x 以上的版本提供了一个cluster模块，允许创建“共享同一个socket”的一组进程，用来分担负载压力。

假如你采用了上面的方案，又同时使用 cluster 模块，情况会怎样呢？

这样得出的方案将同样具有同步处理或者内建线程池一样的缺点：响应缓慢，毫无优雅可言。

有时候，仅仅添加新运行实例并不能解决问题。

方案五：引入 compute-cluster 模块

在 Persona 中，我们的解决方案是，维护一组功能单一（但各不相同）的计算进程。

在这个过程中，我们编写了 compute-cluster 库。

这个库会自动按需启动和管理子进程，这样你就可以通过代码的方式来使用一个本地子进程的集群来处理数据。

使用例子:

const computecluster = require('compute-cluster');
 
// allocate a compute cluster
var cc = new computecluster({ module: './worker.js' });
 
// run work in parallel
cc.enqueue({ input: "foo" }, function (error, result) {
 console.log("foo done", result);
});
cc.enqueue({ input: "bar" }, function (error, result) {
 console.log("bar done", result);
});

Copy after login

fileworker.js 中响应了 message 事件，对传入的请求进行处理：

process.on('message', function(m) {
 var output;
 // do lots of work here, and we don't care that we're blocking the
 // main thread because this process is intended to do one thing at a time.
 var output = doComputationWorkSync(m.input);
 process.send(output);
});

Copy after login

Without changing the calling code, the compute-cluster module can be integrated with the existing asynchronous API, so that true multi-core parallel processing can be achieved with the smallest amount of code.

Let’s look at the performance of this solution from four aspects.

Multi-core parallel capability: The child process uses all cores.

Responsiveness: Since the core management process is only responsible for starting child processes and delivering messages, it is idle most of the time and can handle more interactive requests.

Even if the machine is under heavy load pressure, we can still use the operating system's scheduler to increase the priority of the core management process.

Simplicity: The asynchronous API is used to hide the specific implementation details. We can easily integrate this module into the current project without even changing the calling code.

Now let’s see if we can find a way so that even if the load suddenly surges, the efficiency of the system will not drop abnormally.

Of course, the best goal remains that even if pressure surges, the system can still run efficiently and handle as many requests as possible.

To help implement good solutions, compute-cluster does more than just manage child processes and pass messages; it also manages other information.

It records the number of child processes currently running, and the average time each child process takes to complete.

With these records, we can predict how long it will take before the child process starts.

According to this, coupled with the parameters set by the user (max_request_time), we can directly close those requests that may time out without processing.

This feature makes it easy to base your code on user experience. For example, "Users should not wait more than 10 seconds to log in." This is roughly equivalent to setting max_request_time to 7 seconds (network transmission time needs to be taken into account).

After we stress-tested the Persona service, the results were very satisfying.

Under extremely high pressure conditions, we were still able to provide services to authenticated users, and also blocked some unauthenticated users and displayed relevant error messages.

Introduction to some methods to give full play to the performance of Node.js programs_node.js