How to use cluster cluster in node-JS Tutorial-php.cn

This time I will show you how to use the cluster cluster in node, and what are the precautions for using the cluster cluster in node. The following is a practical case, let's take a look.

Conclusion

Although the number of worker processes is usually set to the number of CPU processes, this number can be exceeded, and the main process is not created first.

if (cluster.isMaster) {
 // 循环 fork 任务 CPU i5-7300HQ 四核四进程
 for (let i = 0; i < 6; i++) {
  cluster.fork()
 }
 console.log(chalk.green(`主进程运行在${process.pid}`))
} else {
 app.listen(1314) // export app 一个 Koa 服务器的实例
 console.log(chalk.green(`子进程运行在${process.pid}`))
}
#子进程运行在17768
#子进程运行在5784
#子进程运行在11232
#子进程运行在7904
#主进程运行在12960
#子进程运行在4300
#子进程运行在16056

Copy after login

In the main process, cluster represents the main process (used for monitoring and sending events), process is its own process, and worker represents the child process, which is obtained through cluster.workers

In the child process process represents a child process (used for listening and sending events), or the current child process can be represented through cluster.worker

cluster.worker.process is equivalent to process (in a child process)

Main process and child processes communicate with each other

Scheduling Strategies, including cluster.SCHED_RR for cycle counting, and cluster.SCHED_NONE depending on the operating system. This is a global setting that will take effect immediately when the first worker process is spawned or cluster.setupMaster() is called. SCHED_RR is the default setting in all operating systems except Windows. Windows systems will also change to SCHED_RR as long as libuv can efficiently distribute IOCP handles without causing serious performance hits. cluster.schedulingPolicy can be implemented by setting the NODE_CLUSTER_SCHED_POLICY environment variable . Valid values for this environment variable include "rr" and "none". RR is Round-Robin polling scheduling, that is, each child process has an equal chance of obtaining events, which is the default except for windows. The scheduling strategy under windows is very strange, as shown in the figure below. Currently, there is no relevant API to set the scheduling strategy algorithm. node only provides us with two values

Process scheduling algorithm.png

The test data is 1000 concurrent requests, repeated test 20 times, performance under windows. It can be seen that the scheduling algorithm of windows is chaotic. If it is the RR algorithm, the scheduling of the four processes should be on the same horizontal line. We have not set up a local Linux environment for the time being. Students who have the conditions can help test it.

Cluster's scheduling algorithm is currently related to the system's

authentication issues between multiple processes

Note: Node.js Routing logic is not supported. Therefore, when designing applications, you should not rely too much on memory data objects (such as

session

s and logins, etc.). Because each worker process is an independent process, they can be shut down or regenerated at any time as needed without affecting the normal operation of other processes. As long as there are worker processes alive, the server can continue to handle connections. If there are no surviving worker processes, existing connections are lost and new connections are rejected. Node.js does not automatically manage the number of worker processes, but the specific application should manage the process pool according to actual needs.

文档中已明确说明了，每一个工作进程都是独立的，并且互相之间除了能够进行通信外，没有办法共享内存。所以在设计鉴权的时候，有两种方法

通过共有的主进程存储鉴权信息，每次前端提交帐号密码，授权完成后，将 token 发送给主进程，下次前台查询时先在主进程获取授权信息
通过统一的外部 redis 存取

两种方法看来还是第二种好的不要太多，因此多进程的环境下，应该使用外部数据库统一存储 token 信息

进一步的子进程间通信思考

虽然 node 中并没有直接提供的进程间通讯功能，但是我们可以通过主进程相互协调进程间的通讯功能，需要定义标准的通信格式，例如

interface cmd {
 type: string
 from: number
 to: number
 msg: any
}

Copy after login

这样通过统一的格式，主进程就可以识别来自各个进程间的通信，起到进程通信中枢的功能

egg.js 中 agent 的实现

        +--------+     +-------+
        | Master |<-------->| Agent |
        +--------+     +-------+
        ^  ^  ^
        /  |   \
       /   |    \
      /    |     \
     v     v     v
+----------+  +----------+  +----------+
| Worker 1 |  | Worker 2 |  | Worker 3 |
+----------+  +----------+  +----------+

Copy after login

我们看到 egg 在多进程模型之间实现了一个 agent 进程，这个进程主要负责对整个系统的定期维护

说到这里，Node.js 多进程方案貌似已经成型，这也是我们早期线上使用的方案。但后来我们发现有些工作其实不需要每个 Worker 都去做，如果都做，一来是浪费资源，更重要的是可能会导致多进程间资源访问冲突。举个例子：生产环境的日志文件我们一般会按照日期进行归档，在单进程模型下这再简单不过了：

每天凌晨 0 点，将当前日志文件按照日期进行重命名

销毁以前的文件句柄，并创建新的日志文件继续写入

试想如果现在是 4 个进程来做同样的事情，是不是就乱套了。所以，对于这一类后台运行的逻辑，我们希望将它们放到一个单独的进程上去执行，这个进程就叫 Agent Worker，简称 Agent。Agent 好比是 Master 给其他 Worker 请的一个『秘书』，它不对外提供服务，只给 App Worker 打工，专门处理一些公共事务。

这样我们可以指定一个进程作为 agent 进程，用于实现自己定义的事务。在 egg 中，主线程启动后首先 fork agent进程，当 agent 进程启动完成后再启动具体的 worker 进程。参照上面的代码，相信这部分逻辑现在也不难实现了。这样 agent 就会获得 id 为1的进程

相信看了本文案例你已经掌握了方法，更多精彩请关注php中文网其它相关文章！