A Node.JS process will only run on a single physical core. Because of this, special attention needs to be paid when developing a scalable server.
Because there is a stable set of APIs and the development of native extensions to manage processes, there are many different ways to design a Node.JS application that can be parallelized. In this blog post, we compare these possible architectures.
This article also introduces the compute-cluster module: a small Node.JS library that can be used to easily manage processes and implement second-line distributed computing.
Problems encountered
In our Mozilla Persona project we need to be able to handle a large number of requests with different characteristics, so we tried to use Node.JS.
In order not to affect the user experience, the ‘Interactive’ request we designed only requires lightweight computing consumption, but provides faster response time so that the UI does not feel stuck. In comparison, a 'Batch' operation takes about half a second to process, and there may be longer delays due to other reasons.
For better design, we have found many solutions that meet our current needs.
Considering scalability and cost, we list the following key requirements:
Through the above points we can filter clearly and with purpose
Option 1: Process directly in the main thread.
When the main thread processes data directly, the result is very bad:
You cannot take full advantage of multi-core CPUs. In interactive request/response, you must wait for the current request (or response) to be processed, which is inelegant.
The only advantage of this solution is that it is simple enough
function myRequestHandler(request, response) [ // Let's bring everything to a grinding halt for half a second. var results = doComputationWorkSync(request.somesuch); }
In a Node.JS program, if you want to handle multiple requests at the same time and want to process them synchronously, then you are going to be in trouble.
Method 2: Whether to use asynchronous processing.
Will there be a big performance improvement if asynchronous methods are used in the background?
The answer is not necessarily. It depends on whether running in the background makes sense
For example, in the following situation: If the performance is not better than synchronous processing when using JavaScript or local code on the main thread to perform calculations, you do not necessarily need to use asynchronous methods in the background to process
Please read the following code
function doComputationWork(input, callback) { // Because the internal implementation of this asynchronous // function is itself synchronously run on the main thread, // you still starve the entire process. var output = doComputationWorkSync(input); process.nextTick(function() { callback(null, output); }); } function myRequestHandler(request, response) [ // Even though this *looks* better, we're still bringing everything // to a grinding halt. doComputationWork(request.somesuch, function(err, results) { // ... do something with results ... });
}
关键点就在于NodeJS异步API的使用并不依赖于多进程的应用
方案三:用线程库来实现异步处理。
只要实现得当,使用本地代码实现的库,在 NodeJS 调用的时候是可以突破限制从而实现多线程功能的。
有很多这样的例子, Nick Campbell 编写的 bcrypt library 就是其中优秀的一个。
如果你在4核机器上拿这个库来作一个测试,你将看到神奇的一幕:4倍于平时的吞吐量,并且耗尽了几乎所有的资源!但是如果你在24核机器上测试,结果将不会有太大变化:有4个核心的使用率基本达到100%,但其他的核心基本上都处于空闲状态。
问题出在这个库使用了NodeJS内部的线程池,而这个线程池并不适合用来进行此类的计算。另外,这个线程池上限写死了,最多只能运行4个线程。
除了写死了上限,这个问题更深层的原因是:
内建线程机制的组件库在这种情况下并不能有效地利用多核的优势,这降低了程序的响应能力,并且随着负载的加大,程序表现越来越差。
方案四:使用 NodeJS 的 cluster 模块
NodeJS 0.6.x 以上的版本提供了一个cluster模块 ,允许创建“共享同一个socket”的一组进程,用来分担负载压力。
假如你采用了上面的方案,又同时使用 cluster 模块,情况会怎样呢?
这样得出的方案将同样具有同步处理或者内建线程池一样的缺点:响应缓慢,毫无优雅可言。
有时候,仅仅添加新运行实例并不能解决问题。
方案五:引入 compute-cluster 模块
在 Persona 中,我们的解决方案是,维护一组功能单一(但各不相同)的计算进程。
在这个过程中,我们编写了 compute-cluster 库。
这个库会自动按需启动和管理子进程,这样你就可以通过代码的方式来使用一个本地子进程的集群来处理数据。
使用例子:
const computecluster = require('compute-cluster'); // allocate a compute cluster var cc = new computecluster({ module: './worker.js' }); // run work in parallel cc.enqueue({ input: "foo" }, function (error, result) { console.log("foo done", result); }); cc.enqueue({ input: "bar" }, function (error, result) { console.log("bar done", result); });
fileworker.js 中响应了 message 事件,对传入的请求进行处理:
process.on('message', function(m) { var output; // do lots of work here, and we don't care that we're blocking the // main thread because this process is intended to do one thing at a time. var output = doComputationWorkSync(m.input); process.send(output); });
Without changing the calling code, the compute-cluster module can be integrated with the existing asynchronous API, so that true multi-core parallel processing can be achieved with the smallest amount of code.
Let’s look at the performance of this solution from four aspects.
Multi-core parallel capability: The child process uses all cores.
Responsiveness: Since the core management process is only responsible for starting child processes and delivering messages, it is idle most of the time and can handle more interactive requests.
Even if the machine is under heavy load pressure, we can still use the operating system's scheduler to increase the priority of the core management process.
Simplicity: The asynchronous API is used to hide the specific implementation details. We can easily integrate this module into the current project without even changing the calling code.
Now let’s see if we can find a way so that even if the load suddenly surges, the efficiency of the system will not drop abnormally.
Of course, the best goal remains that even if pressure surges, the system can still run efficiently and handle as many requests as possible.
To help implement good solutions, compute-cluster does more than just manage child processes and pass messages; it also manages other information.
It records the number of child processes currently running, and the average time each child process takes to complete.
With these records, we can predict how long it will take before the child process starts.
According to this, coupled with the parameters set by the user (max_request_time), we can directly close those requests that may time out without processing.
This feature makes it easy to base your code on user experience. For example, "Users should not wait more than 10 seconds to log in." This is roughly equivalent to setting max_request_time to 7 seconds (network transmission time needs to be taken into account).
After we stress-tested the Persona service, the results were very satisfying.
Under extremely high pressure conditions, we were still able to provide services to authenticated users, and also blocked some unauthenticated users and displayed relevant error messages.