File copy
NodeJS provides basic file operation API, but advanced functions such as file copying are not provided, so let’s practice with the file copying program first. Similar to the copy command, our program needs to be able to accept two parameters: the source file path and the target file path.
Small file copy
We use the built-in fs module of NodeJS to simply implement this program as follows.
var fs = require('fs'); function copy(src, dst) { fs.writeFileSync(dst, fs.readFileSync(src)); } function main(argv) { copy(argv[0], argv[1]); } main(process.argv.slice(2));
The above program uses fs.readFileSync to read the file content from the source path, and uses fs.writeFileSync to write the file content to the target path.
Bean knowledge: process is a global variable and command line parameters can be obtained through process.argv. Since argv[0] is fixedly equal to the absolute path of the NodeJS executable program, and argv[1] is fixedly equal to the absolute path of the main module, the first command line parameter starts from the position argv[2].
Large file copy
The above program has no problem copying some small files, but this method of reading all the file contents into the memory at one time and then writing them to the disk at once is not suitable for copying large files, and the memory will be exhausted. For large files, we can only read a little and write a little until the copy is completed. Therefore, the above program needs to be modified as follows.
var fs = require('fs'); function copy(src, dst) { fs.createReadStream(src).pipe(fs.createWriteStream(dst)); } function main(argv) { copy(argv[0], argv[1]); } main(process.argv.slice(2));
The above program uses fs.createReadStream to create a read-only data stream for the source file, and uses fs.createWriteStream to create a write-only data stream for the target file, and uses the pipe method to connect the two data streams. What happens after the connection is made, to put it more abstractly, is that water flows from one bucket to another along the pipe.
Traverse directories
Traversing directories is a common requirement when manipulating files. For example, when writing a program that needs to find and process all JS files in a specified directory, it needs to traverse the entire directory.
Recursive Algorithm
Recursive algorithms are generally used when traversing directories, otherwise it will be difficult to write concise code. Recursive algorithms are similar to mathematical induction in that they solve problems by continuously reducing their size. The following example illustrates this approach.
function factorial(n) { if (n === 1) { return 1; } else { return n * factorial(n - 1); } }
The above function is used to calculate the factorial of N (N!). As you can see, when N is greater than 1, the problem reduces to computing the factorial of N times N-1. When N equals 1, the problem reaches its minimum size and no further simplification is needed, so 1 is returned directly.
Trap: Although the code written using the recursive algorithm is concise, since each recursion generates a function call, when performance needs to be prioritized, the recursive algorithm needs to be converted into a loop algorithm to reduce the number of function calls.
Traversal Algorithm
The directory is a tree structure, and the depth-first + pre-order traversal algorithm is generally used when traversing. Depth first means that after reaching a node, traverse the child nodes first instead of the neighbor nodes. Preorder traversal means that the traversal is completed when a node is reached for the first time, rather than the last time it returns to a node. Therefore, when using this traversal method, the traversal order of the tree below is A > B > D > E > C > F.
A / \ B C / \ \ D E F
Synchronous traversal
After understanding the necessary algorithms, we can simply implement the following directory traversal function.
function travel(dir, callback) { fs.readdirSync(dir).forEach(function (file) { var pathname = path.join(dir, file); if (fs.statSync(pathname).isDirectory()) { travel(pathname, callback); } else { callback(pathname); } }); }
As you can see, this function uses a directory as the starting point for traversal. When a subdirectory is encountered, the subdirectory is traversed first. When a file is encountered, the absolute path to the file is passed to the callback function. After the callback function gets the file path, it can make various judgments and processes. So let's say we have the following directory:
- /home/user/ - foo/ x.js - bar/ y.js z.css
When traversing the directory using the following code, the input obtained is as follows.
travel('/home/user', function (pathname) { console.log(pathname); });
/home/user/foo/x.js /home/user/bar/y.js /home/user/z.css
Asynchronous traversal
If an asynchronous API is used to read the directory or read the file status, the directory traversal function will be a little complicated to implement, but the principle is exactly the same. The asynchronous version of the travel function is as follows.
function travel(dir, callback, finish) { fs.readdir(dir, function (err, files) { (function next(i) { if (i < files.length) { var pathname = path.join(dir, files[i]); fs.stat(pathname, function (err, stats) { if (stats.isDirectory()) { travel(pathname, callback, function () { next(i + 1); }); } else { callback(pathname, function () { next(i + 1); }); } }); } else { finish && finish(); } }(0)); }); }
The techniques for writing asynchronous traversal functions will not be introduced in detail here. This will be introduced in detail in subsequent chapters. In short, we can see that asynchronous programming is quite complicated.