Home > Web Front-end > JS Tutorial > An in-depth analysis of the module loading mechanism of Node.js

An in-depth analysis of the module loading mechanism of Node.js

青灯夜游
Release: 2020-09-02 10:36:43
forward
2358 people have browsed it

An in-depth analysis of the module loading mechanism of Node.js

Module is a very basic and important concept in Node.js. Various native class libraries are provided through modules, and third-party libraries are also managed and referenced through modules. This article will start from the basic module principle, and in the end we will use this principle to implement a simple module loading mechanism by ourselves, that is, implement a require by ourselves. ​ ​

Node uses JavaScript and commonjs modules, and uses npm/yarn as its package manager.

[Video tutorial recommendation: node js tutorial]

Simple example

Old rules, let’s take a simple example before explaining the principle, from This example starts with a step-by-step in-depth understanding of the principles. If you want to export something in Node.js, you need to use module.exports. Using module.exports can export almost any type of JS object, including strings, functions, and objects. Arrays and so on. Let’s first build a a.js to export the simplest hello world:

// a.js 
module.exports = "hello world";
Copy after login

, and then create a b.js to export a function :

// b.js
function add(a, b) {
  return a + b;
}

module.exports = add;
Copy after login

Then use them in index.js, that is, require them, the result returned by the require function is the corresponding file The value of module.exports:

// index.js
const a = require('./a.js');
const add = require('./b.js');

console.log(a);      // "hello world"
console.log(add(1, 2));    // b导出的是一个加法函数,可以直接使用,这行结果是3
Copy after login

require will run the target file first

When we require a certain module, we don’t just take his module.exports, but will run this file from the beginning. module.exports = XXX is actually just one line of code. As we will talk about later, the effect of this line of code is actually to modify the module. The exports attribute inside. For example, let's have another c.js:

// c.js
let c = 1;

c = c + 1;

module.exports = c;

c = 6;
Copy after login

In c.js we exported a c, this c After several steps of calculation, when running to the line module.exports = c;, the value of c is 2, so we require The value of c.js is 2, and later changing the value of c to 6 does not affect the previous line of code:

const c = require('./c.js');

console.log(c);  // c的值是2
Copy after login

The variable c in the previous c.js is a basic data type, so the following c = 6; does not affect the previous module.exports, what if it is a reference type? Let’s try it directly:

// d.js
let d = {
  num: 1
};

d.num++;

module.exports = d;

d.num = 6;
Copy after login

Then in index.jsrequirehim:

const d = require('./d.js');

console.log(d);     // { num: 6 }
Copy after login

We found that in module. The subsequent assignment of exports to d.num still takes effect, because d is an object and a reference type, and we can modify its value through this reference. In fact, for the reference type, its value can be modified not only after module.exports, but also outside the module. For example, it can be modified directly inside index.js:

const d = require('./d.js');

d.num = 7;
console.log(d);     // { num: 7 }
Copy after login

require and module.exports are not black magic

We can see from the previous example, requireandmodule.exports What we do is not complicated. Let’s first assume that there is a global object {}. It is empty initially. When you require a certain file When this line of code is run, the file will be taken out and executed. If module.exports exists in this file, when this line of code is run, the value of module.exports will be added to this object. The key is The corresponding file name. In the end, the object will look like this:

{
  "a.js": "hello world",
  "b.js": function add(){},
  "c.js": 2,
  "d.js": { num: 2 }
}
Copy after login

When you require a certain file again, if there is a corresponding value in this object, it will be returned to you directly. If If not, repeat the previous steps, execute the target file, then add its module.exports to the global object and return it to the caller. This global object is actually the cache we often hear about. So require and module.exports do not have any black magic. They just run and get the value of the target file, then add it to the cache, and just take it out when needed. Look at this object again, because d.js is a reference type, so you can change its value wherever you get this reference. If you don’t want the value of your module to be changed, You need to handle it when you need to write the module yourself, such as using Object.freeze(), Object.defineProperty() and other methods.

Module type and loading order

The contents of this section are some concepts, which are relatively boring, but they are also what we need to understand.

Module Type

There are several types of modules in Node.js. The ones we used before are actually file modules. To sum up, there are mainly these two types. :

  1. Built-in module: It is the function provided natively by Node.js, such as fs, http, etc. These modules are in Node The .js process is loaded when it starts.
  2. File module: The modules we wrote earlier, as well as third-party modules, that is, node_modulesThe following modules are all file modules.

Loading order

The loading order refers to the order in which we should look for X## when we require(X) #, there is detailed pseudocode in the official document. To sum up, it is roughly in this order:

    Load built-in modules first, even if there is a file with the same name, it will be given priority Use built-in modules.
  1. It is not a built-in module, go to the cache first to find it.
  2. If there is no cache, look for the file with the corresponding path.
  3. If the corresponding file does not exist, this path will be loaded as a folder.
  4. If you can’t find the corresponding files and folders, go to
  5. node_modules and look for them.
  6. I reported an error even if I couldn’t find it.
Load Folder

As mentioned earlier, if you can’t find the file, look for the folder, but it is impossible to load the entire folder, and the same is true when loading a folder. There is a loading sequence:

    First check if there is
  1. package.json under this folder. If there is, look for main inside. Field, if the main field has a value, load the corresponding file. So if you can't find the entrance when looking at the source code of some third-party libraries, just look at the main field in package.json, such as jquery's# The ##main field is like this: "main": "dist/jquery.js". If there is no
  2. package.json
  3. or there is no main in package.json, look for the index file. If neither of these two steps can be found, an error will be reported.
Supported file types

require

Mainly supports three file types:

    .js
  1. : .js file is our most commonly used file type. When loading, the entire JS file will be run first, and then the previously mentioned module.exports will be used as # The return value of ##require. .json
  2. : The
  3. .json file is an ordinary text file, just use JSON.parse to convert it into an object and return it. .node
  4. : The
  5. .node file is a C-compiled binary file. Pure front-ends generally rarely come into contact with this type.
  6. Handwriting
require

In fact, we have already explained the principles in detail before. Now comes our highlight, implementing one by ourselves require

. Implementing

require is actually to implement the module loading mechanism of the entire Node.js. Let’s take a look at the problems that need to be solved:

Find the corresponding module through the incoming path name document.
    Execute the found file, and at the same time inject the
  1. module
  2. and
  3. require methods and attributes so that the module file can be used. Return module’s module.exports
  4. The handwritten code in this article all refers to the Node.js official source code, and the function names and variable names should be kept consistent as much as possible. In fact, it is a simplified version of the source code. You can compare it. When I write down the specific method, I will also post the corresponding source code address. The overall code is in this file:
https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js

Module class

Node.js module loading functions are all in the

Module

class. The entire code uses object-oriented thinking.

If you are not very familiar with JS object-oriented, you can read it first Read this article. ModuleThe constructor of the class is not complicated either. It mainly initializes some values. In order to distinguish it from the official Module name, our own class is named MyModule:

function MyModule(id = '') {
  this.id = id;       // 这个id其实就是我们require的路径
  this.path = path.dirname(id);     // path是Node.js内置模块,用它来获取传入参数对应的文件夹路径
  this.exports = {};        // 导出的东西放这里,初始化为空对象
  this.filename = null;     // 模块对应的文件名
  this.loaded = false;      // loaded用来标识当前模块是否已经加载
}
Copy after login

require方法

我们一直用的require其实是Module类的一个实例方法,内容很简单,先做一些参数检查,然后调用Module._load方法,源码看这里:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L970。精简版的代码如下:

MyModule.prototype.require = function (id) {
  return Module._load(id);
}
Copy after login

MyModule._load

MyModule._load是一个静态方法,这才是require方法的真正主体,他干的事情其实是:

  1. 先检查请求的模块在缓存中是否已经存在了,如果存在了直接返回缓存模块的exports
  2. 如果不在缓存中,就new一个Module实例,用这个实例加载对应的模块,并返回模块的exports

我们自己来实现下这两个需求,缓存直接放在Module._cache这个静态变量上,这个变量官方初始化使用的是Object.create(null),这样可以使创建出来的原型指向null,我们也这样做吧:

MyModule._cache = Object.create(null);

MyModule._load = function (request) {    // request是我们传入的路劲参数
  const filename = MyModule._resolveFilename(request);

  // 先检查缓存,如果缓存存在且已经加载,直接返回缓存
  const cachedModule = MyModule._cache[filename];
  if (cachedModule !== undefined) {
    return cachedModule.exports;
  }

  // 如果缓存不存在,我们就加载这个模块
  // 加载前先new一个MyModule实例,然后调用实例方法load来加载
  // 加载完成直接返回module.exports
  const module = new MyModule(filename);
  
  // load之前就将这个模块缓存下来,这样如果有循环引用就会拿到这个缓存,但是这个缓存里面的exports可能还没有或者不完整
  MyModule._cache[filename] = module;
  
  module.load(filename);
  
  return module.exports;
}
Copy after login

上述代码对应的源码看这里:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L735

可以看到上述源码还调用了两个方法:MyModule._resolveFilenameMyModule.prototype.load,下面我们来实现下这两个方法。

MyModule._resolveFilename

MyModule._resolveFilename从名字就可以看出来,这个方法是通过用户传入的require参数来解析到真正的文件地址的,源码中这个方法比较复杂,因为按照前面讲的,他要支持多种参数:内置模块,相对路径,绝对路径,文件夹和第三方模块等等,如果是文件夹或者第三方模块还要解析里面的package.jsonindex.js。我们这里主要讲原理,所以我们就只实现通过相对路径和绝对路径来查找文件,并支持自动添加jsjson两种后缀名:

MyModule._resolveFilename = function (request) {
  const filename = path.resolve(request);   // 获取传入参数对应的绝对路径
  const extname = path.extname(request);    // 获取文件后缀名

  // 如果没有文件后缀名,尝试添加.js和.json
  if (!extname) {
    const exts = Object.keys(MyModule._extensions);
    for (let i = 0; i < exts.length; i++) {
      const currentPath = `${filename}${exts[i]}`;

      // 如果拼接后的文件存在,返回拼接的路径
      if (fs.existsSync(currentPath)) {
        return currentPath;
      }
    }
  }

  return filename;
}
Copy after login

上述源码中我们还用到了一个静态变量MyModule._extensions,这个变量是用来存各种文件对应的处理方法的,我们后面会实现他。

MyModule._resolveFilename对应的源码看这里:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L822

MyModule.prototype.load

MyModule.prototype.load是一个实例方法,这个方法就是真正用来加载模块的方法,这其实也是不同类型文件加载的一个入口,不同类型的文件会对应MyModule._extensions里面的一个方法:

MyModule.prototype.load = function (filename) {
  // 获取文件后缀名
  const extname = path.extname(filename);

  // 调用后缀名对应的处理函数来处理
  MyModule._extensions[extname](this, filename);

  this.loaded = true;
}
Copy after login

注意这段代码里面的this指向的是module实例,因为他是一个实例方法。对应的源码看这里: https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L942

加载js文件: MyModule._extensions['.js']

前面我们说过不同文件类型的处理方法都挂载在MyModule._extensions上面的,我们先来实现.js类型文件的加载:

MyModule._extensions[&#39;.js&#39;] = function (module, filename) {
  const content = fs.readFileSync(filename, &#39;utf8&#39;);
  module._compile(content, filename);
}
Copy after login

可以看到js的加载方法很简单,只是把文件内容读出来,然后调了另外一个实例方法_compile来执行他。对应的源码看这里:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L1098

编译执行js文件:MyModule.prototype._compile

MyModule.prototype._compile是加载JS文件的核心所在,也是我们最常使用的方法,这个方法需要将目标文件拿出来执行一遍,执行之前需要将它整个代码包裹一层,以便注入exports, require, module, __dirname, __filename,这也是我们能在JS文件里面直接使用这几个变量的原因。要实现这种注入也不难,假如我们require的文件是一个简单的Hello World,长这样:

module.exports = "hello world";
Copy after login

那我们怎么来给他注入module这个变量呢?答案是执行的时候在他外面再加一层函数,使他变成这样:

function (module) { // 注入module变量,其实几个变量同理
  module.exports = "hello world";
}
Copy after login

所以我们如果将文件内容作为一个字符串的话,为了让他能够变成上面这样,我们需要再给他拼接上开头和结尾,我们直接将开头和结尾放在一个数组里面:

MyModule.wrapper = [
  &#39;(function (exports, require, module, __filename, __dirname) { &#39;,
  &#39;\n});&#39;
];
Copy after login

注意我们拼接的开头和结尾多了一个()包裹,这样我们后面可以拿到这个匿名函数,在后面再加一个()就可以传参数执行了。然后将需要执行的函数拼接到这个方法中间:

MyModule.wrap = function (script) {
  return MyModule.wrapper[0] + script + MyModule.wrapper[1];
};
Copy after login

这样通过MyModule.wrap包装的代码就可以获取到exports, require, module, __filename, __dirname这几个变量了。知道了这些就可以来写MyModule.prototype._compile了:

MyModule.prototype._compile = function (content, filename) {
  const wrapper = Module.wrap(content);    // 获取包装后函数体

  // vm是nodejs的虚拟机沙盒模块,runInThisContext方法可以接受一个字符串并将它转化为一个函数
  // 返回值就是转化后的函数,所以compiledWrapper是一个函数
  const compiledWrapper = vm.runInThisContext(wrapper, {
    filename,
    lineOffset: 0,
    displayErrors: true,
  });

  // 准备exports, require, module, __filename, __dirname这几个参数
  // exports可以直接用module.exports,即this.exports
  // require官方源码中还包装了一层,其实最后调用的还是this.require
  // module不用说,就是this了
  // __filename直接用传进来的filename参数了
  // __dirname需要通过filename获取下
  const dirname = path.dirname(filename);

  compiledWrapper.call(this.exports, this.exports, this.require, this,
    filename, dirname);
}
Copy after login

上述代码要注意我们注入进去的几个参数和通过call传进去的this:

  1. this:compiledWrapper是通过call调用的,第一个参数就是里面的this,这里我们传入的是this.exports,也就是module.exports,也就是说我们js文件里面this是对module.exports的一个引用。
  2. exports: compiledWrapper正式接收的第一个参数是exports,我们传的也是this.exports,所以js文件里面的exports也是对module.exports的一个引用。
  3. require: 这个方法我们传的是this.require,其实就是MyModule.prototype.require,也就是MyModule._load
  4. module: 我们传入的是this,也就是当前模块的实例。
  5. __filename:文件所在的绝对路径。
  6. __dirname: 文件所在文件夹的绝对路径。

到这里,我们的JS文件其实已经记载完了,对应的源码看这里:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js#L1043

加载json文件: MyModule._extensions['.json']

加载json文件就简单多了,只需要将文件读出来解析成json就行了:

MyModule._extensions[&#39;.json&#39;] = function (module, filename) {
  const content = fs.readFileSync(filename, &#39;utf8&#39;);
  module.exports = JSONParse(content);
}
Copy after login

exportsmodule.exports的区别

网上经常有人问,node.js里面的exportsmodule.exports到底有什么区别,其实前面我们的手写代码已经给出答案了,我们这里再就这个问题详细讲解下。exportsmodule.exports这两个变量都是通过下面这行代码注入的。

compiledWrapper.call(this.exports, this.exports, this.require, this,
    filename, dirname);
Copy after login

初始状态下,exports === module.exports === {}exportsmodule.exports的一个引用,如果你一直是这样使用的:

exports.a = 1;
module.exports.b = 2;

console.log(exports === module.exports);   // true
Copy after login

上述代码中,exportsmodule.exports都是指向同一个对象{},你往这个对象上添加属性并没有改变这个对象本身的引用地址,所以exports === module.exports一直成立。

但是如果你哪天这样使用了:

exports = {
  a: 1
}
Copy after login

或者这样使用了:

module.exports = {
    b: 2
}
Copy after login

那其实你是给exports或者module.exports重新赋值了,改变了他们的引用地址,那这两个属性的连接就断开了,他们就不再相等了。需要注意的是,你对module.exports的重新赋值会作为模块的导出内容,但是你对exports的重新赋值并不能改变模块导出内容,只是改变了exports这个变量而已,因为模块始终是module,导出内容是module.exports

循环引用

Node.js对于循环引用是进行了处理的,下面是官方例子:

a.js:

console.log(&#39;a 开始&#39;);
exports.done = false;
const b = require(&#39;./b.js&#39;);
console.log(&#39;在 a 中,b.done = %j&#39;, b.done);
exports.done = true;
console.log(&#39;a 结束&#39;);
Copy after login

b.js:

console.log(&#39;b 开始&#39;);
exports.done = false;
const a = require(&#39;./a.js&#39;);
console.log(&#39;在 b 中,a.done = %j&#39;, a.done);
exports.done = true;
console.log(&#39;b 结束&#39;);
Copy after login

main.js:

console.log(&#39;main 开始&#39;);
const a = require(&#39;./a.js&#39;);
const b = require(&#39;./b.js&#39;);
console.log(&#39;在 main 中,a.done=%j,b.done=%j&#39;, a.done, b.done);
Copy after login

main.js 加载 a.js 时, a.js 又加载 b.js。 此时, b.js 会尝试去加载 a.js。 为了防止无限的循环,会返回一个 a.jsexports 对象的 未完成的副本b.js 模块。 然后 b.js 完成加载,并将 exports 对象提供给 a.js 模块。

那么这个效果是怎么实现的呢?答案就在我们的MyModule._load源码里面,注意这两行代码的顺序:

MyModule._cache[filename] = module;

module.load(filename);
Copy after login

上述代码中我们是先将缓存设置了,然后再执行的真正的load,顺着这个思路我能来理一下这里的加载流程:

  1. main加载aa在真正加载前先去缓存中占一个位置
  2. a在正式加载时加载了b
  3. b又去加载了a,这时候缓存中已经有a了,所以直接返回a.exports,即使这时候的exports是不完整的。

总结

  1. require不是黑魔法,整个Node.js的模块加载机制都是JS实现的。
  2. 每个模块里面的exports, require, module, __filename, __dirname五个参数都不是全局变量,而是模块加载的时候注入的。
  3. 为了注入这几个变量,我们需要将用户的代码用一个函数包裹起来,拼一个字符串然后调用沙盒模块vm来实现。
  4. 初始状态下,模块里面的this, exports, module.exports都指向同一个对象,如果你对他们重新赋值,这种连接就断了。
  5. module.exports的重新赋值会作为模块的导出内容,但是你对exports的重新赋值并不能改变模块导出内容,只是改变了exports这个变量而已,因为模块始终是module,导出内容是module.exports
  6. 为了解决循环引用,模块在加载前就会被加入缓存,下次再加载会直接返回缓存,如果这时候模块还没加载完,你可能拿到未完成的exports
  7. Node.js实现的这套加载机制叫CommonJS

本文完整代码已上传GitHub:https://github.com/dennis-jiang/Front-End-Knowledges/blob/master/Examples/Node.js/Module/MyModule/index.js

参考资料

Node.js模块加载源码:https://github.com/nodejs/node/blob/c6b96895cc74bc6bd658b4c6d5ea152d6e686d20/lib/internal/modules/cjs/loader.js

Node.js模块官方文档:http://nodejs.cn/api/modules.html

文章的最后,感谢你花费宝贵的时间阅读本文,如果本文给了你一点点帮助或者启发,请不要吝啬你的赞和GitHub小星星,你的支持是作者持续创作的动力。

作者博文GitHub项目地址: https://github.com/dennis-jiang/Front-End-Knowledges

更多编程相关知识,可访问:编程教学!!

The above is the detailed content of An in-depth analysis of the module loading mechanism of Node.js. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:segmentfault.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template