How Webpack operates caching-JS Tutorial-php.cn

This time I will show you how Webpack operates the cache, and what are the precautions for Webpack to operate the cache. The following is a practical case, let's take a look.

Preface

Recently I was looking at how webpack makes persistent cache content, and found that there are still some pitfalls in it. I just have time to sort and summarize them. After reading this article, you can roughly understand:

What is persistent caching and why do we need to do persistent caching?
How does webpack do persistent caching? ?
Some points to note when doing caching with webpack.

Persistence Cache

First of all, we need to explain what persistence cache is, in the context of the current popularity of applications that separate front and rear ends. Under the circumstances, front-end html, css, and js often exist on the server in the form of a static resource file, and data is obtained through interfaces to display dynamic content. This involves the issue of how the company deploys front-end code, so it involves an issue of update deployment. Should the page be deployed first, or the resources be deployed first?

Deploy the page first, then deploy the resources: During the time interval between the two deployments, if a user accesses the page, the old resource will be loaded in the new page structure, and the old version of the resource will be regarded as the new When the version is cached, the result is that the user accesses a page with a disordered style. Unless refreshed manually, the page will remain in a disordered state until the resource cache expires.

Deploy resources first, then deploy pages: During the deployment interval, users with locally cached resources of older versions visit the website. Since the requested page is an older version, the resource reference has not changed and the browser will use it directly. Local caching is normal, but when users without local caching or cache expired visit the website, the old version page will load the new version resource, resulting in page execution errors.

So we need a deployment strategy to ensure that when we update our online code, online users can transition smoothly and open our website correctly.

It is recommended to read this answer first: How to develop and deploy front-end code in a large company?

After you read the above answer, you will roughly understand that the more mature persistent caching solution now is to add a hash value after the name of the static resource, because the hash value generated each time the file is modified is different. The advantage of this is to publish files incrementally to avoid overwriting previous files and causing online user access to fail.

Because as long as the names of the static resources (css, js, img) published each time are unique, then I can:

For html files : Do not enable caching, put the html on your own server, turn off the server's cache, and your own server only provides html files and data interfaces
For static js, css, pictures, etc. File: Enable CDN and caching, and upload static resources to the CDN service provider. We can enable long-term caching of resources. Because the path of each resource is unique, it will not cause the resource to be overwritten, ensuring the stability of online user access. sex.
Every time an update is released, the static resources (js, css, img) are first transferred to the cdn service, and then the html file is uploaded. This ensures that old users can Normal access allows new users to see new pages.

The above briefly introduces the mainstream front-end persistent caching solution, so why do we need to do persistent caching?

When a user visits our site for the first time using a browser, the page introduces a variety of static resources. If we can achieve persistent caching, we can add Cache- in the http response header. control or Expires field to set the cache, the browser can cache these resources locally one by one.

If the user needs to request the same static resource again during subsequent visits, and the static resource has not expired, the browser can directly cache it locally instead of requesting the resource through the network.

How webpack does persistent caching

After briefly introducing persistent caching, the following is the focus, so how should we do persistent caching in webpack? Well, we need to do the following two things:

Ensure the uniqueness of the hash value, that is, generate a unique hash value for each packaged resource. As long as the packaged content is inconsistent, then The hash values are inconsistent.
To ensure the stability of the hash value, we need to ensure that when a module is modified, only the hash value of the affected packaged file changes, and the hash value of the packaged file unrelated to the module remains unchanged.

Hash file name is the first step to implement persistent caching. Currently, webpack has two ways to calculate hash ([hash] and [chunkhash])

hash means that each time webpack generates a unique hash value during the compilation process, it will be re-created after any file in the project is changed, and then webpack calculates the new hash value.
chunkhash is a hash value calculated based on the module, so changes to a certain file will only affect its own hash value and will not affect other files.

So if you simply package everything into the same file, then hash can satisfy you. If your project involves unpacking, loading in modules, etc. , then you need to use chunkhash to ensure that only the relevant file hash value changes after each update.

So our webpack configuration with persistent cache should look like this:

module.exports = {
 entry: __dirname + '/src/index.js',
 output: {
 path: __dirname + '/dist',
 filename: '[name].[chunkhash:8].js',
 }
}

Copy after login

The meaning of the above code is: use index.js as the entry point to package all the code into one The file is named index.xxxx.js and placed in the dist directory. Now we can generate a newly named file every time we update the project.

If you are dealing with simple scenarios, this is enough, but in large multi-page applications, we often need to optimize the performance of the page:

Separate business Code and third-party code: The reason why we separate business code and third-party code is because business code updates frequently and third-party code updates and iterations are slow, so we separate third-party code (libraries, frameworks) , which can make full use of the browser's cache to load third-party libraries.
On-demand loading: For example, when using React-Router, when the user needs to access a certain route, the corresponding component will be loaded. Then the user does not need to load it at the beginning. At this time, all routing components will be downloaded locally.
In multi-page applications, we can often extract common modules, such as header, footer, etc., so that when the page jumps, these common modules exist in the cache , you can load it directly instead of making a network request.

So how to unpack and load modules into modules requires webpack's built-in plug-in: CommonsChunkPlugin. Below I will use an example to explain how to configure webpack.

The code of this article is placed on my Github. If you are interested, you can download it and take a look:

git clone https://github.com/happylindz/blog.git
cd blog/code/multiple-page-webpack-demo
npm install

Copy after login

Before reading the following content, I strongly recommend that you read my previous article: In-depth understanding Webpack file packaging mechanism. Understanding the webpack file packaging mechanism will help you better implement persistent caching.

The example is roughly described like this: it consists of two pages pageA and pageB

// src/pageA.js
import componentA from './common/componentA';
// 使用到 jquery 第三方库，需要抽离，避免业务打包文件过大
import $ from 'jquery';
// 加载 css 文件，一部分为公共样式，一部分为独有样式，需要抽离
import './css/common.css'
import './css/pageA.css';
console.log(componentA);
console.log($.trim(' do something '));
// src/pageB.js
// 页面 A 和 B 都用到了公共模块 componentA，需要抽离，避免重复加载
import componentA from './common/componentA';
import componentB from './common/componentB';
import './css/common.css'
import './css/pageB.css';
console.log(componentA);
console.log(componentB);
// 用到异步加载模块 asyncComponent，需要抽离，加载首屏速度
document.getElementById('xxxxx').addEventListener('click', () => {
 import( /* webpackChunkName: "async" */
 './common/asyncComponent.js').then((async) => {
  async();
 })
})
// 公共模块基本长这样
export default "component X";

Copy after login

The above page content basically involves the three modes of splitting our modules: splitting the public library , loading and splitting common modules on demand. Then the next step is to configure webpack:

const path = require('path');
const webpack = require('webpack');
const ExtractTextPlugin = require('extract-text-webpack-plugin');
module.exports = {
 entry: {
 pageA: [path.resolve(__dirname, './src/pageA.js')],
 pageB: path.resolve(__dirname, './src/pageB.js'),
 },
 output: {
 path: path.resolve(__dirname, './dist'),
 filename: 'js/[name].[chunkhash:8].js',
 chunkFilename: 'js/[name].[chunkhash:8].js'
 },
 module: {
 rules: [
  {
  // 用正则去匹配要用该 loader 转换的 CSS 文件
  test: /.css$/,
  use: ExtractTextPlugin.extract({
   fallback: "style-loader",
   use: ["css-loader"]
  }) 
  }
 ]
 },
 plugins: [
 new webpack.optimize.CommonsChunkPlugin({
  name: 'common',
  minChunks: 2,
 }),
 new webpack.optimize.CommonsChunkPlugin({
  name: 'vendor',
  minChunks: ({ resource }) => (
  resource && resource.indexOf('node_modules') >= 0 && resource.match(/.js$/)
  )
 }),
 new ExtractTextPlugin({
  filename: `css/[name].[chunkhash:8].css`,
 }),
 ]
}

Copy after login

The first CommonsChunkPlugin is used to extract public modules, which is equivalent to saying webpack boss, if you see a module being loaded twice or more, then please Help me move it to the common chunk. Here, minChunks is 2, and the granularity is the smallest. You can choose how many times to use the modules before extracting them according to your actual situation.

The second CommonsChunkPlugin is used to extract third-party codes, extract them, and determine whether the resources come from node_modules. If so, it means that they are third-party modules, then extract them. It is equivalent to telling the webpack boss that if you see some modules coming from the node_modules directory and their names end with .js, please move them to the vendor chunk. If the vendor chunk does not exist, create a new one. .

What are the benefits of this configuration? As our business grows, we are likely to rely on more and more third-party library codes. If we specially configure an entrance to store third-party code, then our webpack .config.js will become:

// 不利于拓展
module.exports = {
 entry: {
 app: './src/main.js',
 vendor: [
  'vue',
  'axio',
  'vue-router',
  'vuex',
  // more
 ],
 },
}

Copy after login

The third ExtractTextPlugin plug-in is used to extract css from the packaged js file and generate an independent css file. Imagine that when you just modify it The style does not modify the functional logic of the page. You definitely don't want the hash value of your js file to change. You definitely want css and js to be separated from each other and not affect each other.

运行 webpack 后可以看到打包之后的效果:

├── css
│ ├── common.2beb7387.css
│ ├── pageA.d178426d.css
│ └── pageB.33931188.css
└── js
 ├── async.03f28faf.js
 ├── common.2beb7387.js
 ├── pageA.d178426d.js
 ├── pageB.33931188.js
 └── vendor.22a1d956.js

Copy after login

可以看出 css 和 js 已经分离，并且我们对模块进行了拆分，保证了模块 chunk 的唯一性，当你每次更新代码的时候，会生成不一样的 hash 值。

唯一性有了，那么我们需要保证 hash 值的稳定性，试想下这样的场景，你肯定不希望你修改某部分的代码(模块，css)导致了文件的 hash 值全变了，那么显然是不明智的，那么我们去做到 hash 值变化最小化呢？

换句话说，我们就要找出 webpack 编译中会导致缓存失效的因素，想办法去解决或优化它？

影响 chunkhash 值变化主要由以下四个部分引起的：

包含模块的源代码
webpack 用于启动运行的 runtime 代码
webpack 生成的模块 moduleid(包括包含模块 id 和被引用的依赖模块 id)
chunkID

这四部分只要有任意部分发生变化，生成的分块文件就不一样了，缓存也就会失效，下面就从四个部分一一介绍：

一、源代码变化：

显然不用多说，缓存必须要刷新，不然就有问题了

二、webpack 启动运行的 runtime 代码：

看过我之前的文章：深入理解 webpack 文件打包机制就会知道，在 webpack 启动的时候需要执行一些启动代码。

(function(modules) {
 window["webpackJsonp"] = function webpackJsonpCallback(chunkIds, moreModules) {
 // ...
 };
 function __webpack_require__(moduleId) {
 // ...
 }
 __webpack_require__.e = function requireEnsure(chunkId, callback) {
 // ...
 script.src = __webpack_require__.p + "" + chunkId + "." + ({"0":"pageA","1":"pageB","3":"vendor"}[chunkId]||chunkId) + "." + {"0":"e72ce7d4","1":"69f6bbe3","2":"9adbbaa0","3":"53fa02a7"}[chunkId] + ".js";
 };
})([]);

Copy after login

大致内容像上面这样，它们是 webpack 的一些启动代码，它们是一些函数，告诉浏览器如何加载 webpack 定义的模块。

其中有一行代码每次更新都会改变的，因为启动代码需要清楚地知道 chunkid 和 chunkhash 值得对应关系，这样在异步加载的时候才能正确地拼接出异步 js 文件的路径。

那么这部分代码最终放在哪个文件呢？因为我们刚才配置的时候最后生成的 common chunk 模块，那么这部分运行时代码会被直接内置在里面，这就导致了，我们每次更新我们业务代码(pageA, pageB, 模块)的时候， common chunkhash 会一直变化，但是这显然不符合我们的设想，因为我们只是要用 common chunk 用来存放公共模块(这里指的是 componentA)，那么我 componentA 都没去修改，凭啥 chunkhash 需要变了。

所以我们需要将这部分 runtime 代码抽离成单独文件。

module.exports = {
 // ...
 plugins: [
 // ...
 // 放到其他的 CommonsChunkPlugin 后面
 new webpack.optimize.CommonsChunkPlugin({
  name: 'runtime',
  minChunks: Infinity,
 }),
 ]
}

Copy after login

这相当于是告诉 webpack 帮我把运行时代码抽离，放到单独的文件中。

├── css
│ ├── common.4cc08e4d.css
│ ├── pageA.d178426d.css
│ └── pageB.33931188.css
└── js
 ├── async.03f28faf.js
 ├── common.4cc08e4d.js
 ├── pageA.d178426d.js
 ├── pageB.33931188.js
 ├── runtime.8c79fdcd.js
 └── vendor.cef44292.js

Copy after login

多生成了一个 runtime.xxxx.js，以后你在改动业务代码的时候，common chunk 的 hash 值就不会变了，取而代之的是 runtime chunk hash 值会变，既然这部分代码是动态的，可以通过 chunk-manifest-webpack-plugin 将他们 inline 到 html 中，减少一次网络请求。

三、webpack 生成的模块 moduleid

在 webpack2 中默认加载 OccurrenceOrderPlugin 这个插件，OccurrenceOrderPlugin 插件会按引入次数最多的模块进行排序，引入次数的模块的 moduleId 越小，但是这仍然是不稳定的，随着你代码量的增加，虽然代码引用次数的模块 moduleId 越小，越不容易变化，但是难免还是不确定的。

默认情况下，模块的 id 是这个模块在模块数组中的索引。OccurenceOrderPlugin 会将引用次数多的模块放在前面，在每次编译时模块的顺序都是一致的，如果你修改代码时新增或删除了一些模块，这将可能会影响到所有模块的 id。

最佳实践方案是通过 HashedModuleIdsPlugin 这个插件，这个插件会根据模块的相对路径生成一个长度只有四位的字符串作为模块的 id，既隐藏了模块的路径信息，又减少了模块 id 的长度。

这样一来，改变 moduleId 的方式就只有文件路径的改变了，只要你的文件路径值不变，生成四位的字符串就不变，hash 值也不变。增加或删除业务代码模块不会对 moduleid 产生任何影响。

module.exports = {
 plugins: [
 new webpack.HashedModuleIdsPlugin(),
 // 放在最前面
 // ...
 ]
}

Copy after login

四、chunkID

实际情况中分块的个数的顺序在多次编译之间大多都是固定的, 不太容易发生变化。

这里涉及的只是比较基础的模块拆分，还有一些其它情况没有考虑到，比如异步加载组件中包含公共模块，可以再次将公共模块进行抽离。形成异步公共 chunk 模块。有想深入学习的可以看这篇文章：Webpack 大法之 Code Splitting

webpack 做缓存的一些注意点

CSS 文件 hash 值失效的问题
不建议线上发布使用 DllPlugin 插件

CSS 文件 hash 值失效的问题：

ExtractTextPlugin 有个比较严重的问题，那就是它生成文件名所用的[chunkhash]是直接取自于引用该 css 代码段的 js chunk ；换句话说，如果我只是修改 css 代码段，而不动 js 代码，那么最后生成出来的 css 文件名依然没有变化。

所以我们需要将 ExtractTextPlugin 中的 chunkhash 改为 contenthash，顾名思义，contenthash 代表的是文本文件内容的 hash 值，也就是只有 style 文件的 hash 值。这样编译出来的 js 和 css 文件就有独立的 hash 值了。

module.exports = {
 plugins: [
 // ...
 new ExtractTextPlugin({
  filename: `css/[name].[contenthash:8].css`,
 }),
 ]
}

Copy after login

如果你使用的是 webpack2，webpack3，那么恭喜你，这样就足够了，js 文件和 css 文件修改都不会影响到相互的 hash 值。那如果你使用的是 webpack1，那么就会出现问题。

具体来讲就是 webpack1 和 webpack 在计算 chunkhash 值得不同：

webpack1 在涉及的时候并没有考虑像 ExtractTextPlugin 会将模块内容抽离的问题，所以它在计算 chunkhash 的时候是通过打包之前模块内容去计算的，也就是说在计算的时候 css 内容也包含在内，之后才将 css 内容抽离成单独的文件，

那么就会出现：如果只修改了 css 文件，未修改引用的 js 文件，那么编译输出的 js 文件的 hash 值也会改变。

对此，webpack2 做了改进，它是基于打包后文件内容来计算 hash 值的，所以是在 ExtractTextPlugin 抽离 css 代码之后，所以就不存在上述这样的问题。如果不幸的你还在使用 webpack1，那么推荐你使用 md5-hash-webpack-plugin 插件来改变 webpack 计算 hash 的策略。

不建议线上发布使用 DllPlugin 插件

为什么这么说呢？因为最近有朋友来问我，他们 leader 不让在线上用 DllPlugin 插件，来问我为什么？

DllPlugin 本身有几个缺点：

首先你需要额外多配置一份 webpack 配置，增加工作量。
其中一个页面用到了一个体积很大的第三方依赖库而其它页面根本不需要用到，但若直接将它打包在 dll.js 里很不值得，每次页面打开都要去加载这段无用的代码，无法使用到 webpack2 的 Code Splitting 功能。
第一次打开的时候需要下载 dll 文件，因为你把很多库全部打在一起了，导致 dll 文件很大，首次进入页面加载速度很慢。

虽然你可以打包成 dll 文件，然后让浏览器去读取缓存，这样下次就不用再去请求，比如你用 lodash 其中一个函数，而你用dll会将整个 lodash 文件打进去，这就会导致你加载无用代码过多，不利于首屏渲染时间。

我认为的正确的姿势是：

像 React、Vue 这样整体性偏强的库，可以生成 vendor 第三方库来去做缓存，因为你一般技术体系是固定的，一个站点里面基本上都会用到统一技术体系，所以生成 vendor 库用于缓存。
像 antd、lodash 这种功能性组件库，可以通过 tree shaking 来进行消除，只保留有用的代码，千万不要直接打到 vendor 第三方库里，不然你将大量执行无用的代码。

相信看了本文案例你已经掌握了方法，更多精彩请关注php中文网其它相关文章！