Table of Contents
1. File reading and writing in node
1.1 Regular file reading and writing
1.2 Stream file reading and writing
2. Node file read and write RAM and Blob size restrictions
2.2 Segmented reading
3. Others
3.1 Expand the reading and writing of large files on the browser side
3.2 请求静态资源大文件
Home Web Front-end JS Tutorial A brief analysis of how Nodejs reads and writes large files

A brief analysis of how Nodejs reads and writes large files

Sep 28, 2022 pm 08:09 PM
nodejs node

A brief analysis of how Nodejs reads and writes large files

The author has recently been doing some file reading, writing and multi-part upload work on node. During this process, I found that if the file read by node exceeds 2G, it will exceed 2G. In order to read the maximum value of the Blob, a read exception will occur. In addition, reading and writing files in the node is also limited by the server RAM. It needs to be read in slices. I will record the problems encountered and the process of solving the problems. [Related tutorial recommendations: nodejs video tutorial]

  • File reading and writing in node
  • Node file reading and writing RAM and Blob size restrictions
  • Others

1. File reading and writing in node

1.1 Regular file reading and writing

Normally, if we want to read a relatively small file, we can directly pass:

const fs = require('fs')
let data = fs.readFileSync("./test.png")
console.log(data,123)
//输出data = <Buffer 89 50 4e ...>
Copy after login

Generally speaking, the synchronization method is not recommended because js/nodejs is single-threaded. Synchronous methods block the main thread. The latest version of node directly provides fs.promise, which can be used directly in combination with async/await:

const fs = require('fs')
const readFileSync = async () => {
    let data = await fs.promises.readFile("./test.png")
    console.log(data,123)
}
readFileSync()
//输出data = <Buffer 89 50 4e ...>
Copy after login

The asynchronous method call here will not block the main thread, and the IO of multiple file readings can also be performed in parallel, etc. .

1.2 Stream file reading and writing

For conventional file reading and writing, we will read the file into the memory at one time. This method is time efficient and memory efficient. Both are very low. Low time efficiency means that the file must be read once before execution. Low memory efficiency means that the file must be read and put into the memory at once, which takes up a lot of memory. Therefore, in this case, we generally use Stream to read files:

const fs = require('fs')
const readFileTest = () => {
    var data = ''
    var rs = fs.createReadStream('./test.png');
    rs.on('data', function(chunk) {
        data += chunk;
        console.log(chunk)
     });
    rs.on('end',function(){
        console.log(data);
    });
    rs.on('error', function(err){
        console.log(err.stack);
     });
}
readFileTest()
// data = <Buffer 89 50 64 ...>
Copy after login

Reading and writing files through Steam can improve memory efficiency and time efficiency.

  • Memory efficiency: There is no need to load large amounts (or the entire data) in memory before processing the data
  • Time efficiency: Once you have the data, you can start processing, which greatly Reduce the time it takes to start processing data instead of waiting until the entire data is loaded.

Stream files also support the second writing method:

const fs = require('fs')
const readFileTest = () => {
    var data = ''
    var chunk;
    var rs = fs.createReadStream('./test.png');
    rs.on('readable', function() {
    while ((chunk=rs.read()) != null) {
        data += chunk;
    }});
    rs.on('end', function() {
        console.log(data)
    });
};
readFileTest()
Copy after login

2. Node file read and write RAM and Blob size restrictions

2.1 Basic question

When reading large files, there will be a limit on the size of the read file. For example, we are currently reading a 2.5G video file:

const fs = require('fs')
const readFileTest = async () => {
    let data = await fs.promises.readFile("./video.mp4")
    console.log(data)
}
readFileTest()
Copy after login

Executing the above code will report an error:

RangeError [ERR_FS_FILE_TOO_LARGE]: File size (2246121911) is greater than 2 GB

We may think that by setting option, NODE_OPTIONS='--max -old-space-size=5000', at this time 5000M>2.5G, but the error still does not disappear, which means that the size limit of the file read by the node cannot be changed through Options.

The above is a conventional way to read large files. Is there any file size limit if it is read through Steam? For example:

const fs = require('fs')
const readFileTest = () => {
    var data = ''
    var rs = fs.createReadStream('./video.mp4');
    rs.on('data', function(chunk) {
        data += chunk;
     });
    rs.on('end',function(){
        console.log(data);
    });
    rs.on('error', function(err){
        console.log(err.stack);
     });
}
readFileTest()
Copy after login

There will be no exception when reading a 2.5G file in the above way, but please note that there is an error here:

data += chunk;
                ^

RangeError: Invalid string length
Copy after login

This is because the length of the data exceeds Maximum limit, such as 2048M, etc. Therefore, when processing with Steam, when saving the reading results, pay attention to the file size, which must not exceed the default maximum value of the Buffer. In the above case, we don't need data = chunk to save all the data in one large data. We can read and process it at the same time.

2.2 Segmented reading

During the process of reading the file, createReadStream can actually read in segments. This method of segmented reading can also be used. As an alternative to reading large files. Especially when reading concurrently, it has certain advantages and can improve the speed of file reading and processing.

createReadStream accepts the second parameter {start, end}. We can get the size of the file through fs.promises.stat, then determine the fragments, and finally read the fragments once, for example:

  1. Get the file size
const info = await fs.promises.stat(filepath)
   const size = info.size
Copy after login
  1. Fragment according to the specified SIZE (such as 128M per fragment)
  const SIZE = 128 * 1024 * 1024
  let sizeLen = Math.floor(size/SIZE)
    let total = sizeLen +1 ;
    for(let i=0;i<=sizeLen;i++){
      if(sizeLen ===i){
        console.log(i*SIZE,size,total,123)
        readStremfunc(i*SIZE,size,total)
      }else{
        console.log(i*SIZE,(i+1)*SIZE,total,456)
        readStremfunc(i*SIZE,(i+1)*SIZE-1,total)
      }
    }
  //分片后【0,128M】,【128M, 256M】...
Copy after login

3. Implement the read function

const readStremfunc = () => {
    const readStream =  fs.createReadStream(filepath,{start:start,end:end})
    readStream.setEncoding('binary')
    let data = ''
    readStream.on('data', chunk => {
        data = data + chunk
    })
    readStream.end('data', () => {
      ...
    })
}
Copy after login

It is worth noting that fs.createReadStream(filepath,{ start, end}), start and end are closed front and back. For example, fs.createReadSteam(filepath,{start:0,end:1023}) reads [0,1023] for a total of 1024 bits.

3. Others

3.1 Expand the reading and writing of large files on the browser side

The large files were previously stored in nodejs So is there any problem with reading large files on the browser side?

    浏览器在本地读取大文件时,之前有类似FileSaver、StreamSaver等方案,不过在浏览器本身添加了File的规范,使得浏览器本身就默认和优化了Stream的读取。我们不需要做额外的工作,相关的工作:github.com/whatwg/fs。不过不同的版本会有兼容性的问题,我们还是可以通过FileSaver等进行兼容。

3.2 请求静态资源大文件

    如果是在浏览器中获取静态资源大文件,一般情况下只需要通过range分配请求即可,一般的CDN加速域名,不管是阿里云还是腾讯云,对于分片请求都支持的很好,我们可以将资源通过cdn加速,然后在浏览器端直接请求cdn加速有的资源。

    分片获取cdn静态资源大文件的步骤为,首先通过head请求获取文件大小:

const getHeaderInfo = async (url: string) => {
  const res: any = await axios.head(url + `?${Math.random()}`);
  return res?.headers;
};
const header = getHeaderInfo(source_url)
const size = header['content-length']
Copy after login

我们可以从header中的content-length属性中,获取文件的大小。然后进行分片和分段,最后发起range请求:

const getRangeInfo = async (url: string, start: number, end: number) => {
    const data = await axios({
      method: 'get',
      url,
      headers: {
        range: `bytes=${start}-${end}`,
      },
      responseType: 'blob',
    });
    return data?.data;
  };
Copy after login

在headers中指定 range: bytes=${start}-${end},就可以发起分片请求去获取分段资源,这里的start和end也是前闭后闭的。

更多node相关知识,请访问:nodejs 教程

The above is the detailed content of A brief analysis of how Nodejs reads and writes large files. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The difference between nodejs and tomcat The difference between nodejs and tomcat Apr 21, 2024 am 04:16 AM

The main differences between Node.js and Tomcat are: Runtime: Node.js is based on JavaScript runtime, while Tomcat is a Java Servlet container. I/O model: Node.js uses an asynchronous non-blocking model, while Tomcat is synchronous blocking. Concurrency handling: Node.js handles concurrency through an event loop, while Tomcat uses a thread pool. Application scenarios: Node.js is suitable for real-time, data-intensive and high-concurrency applications, and Tomcat is suitable for traditional Java web applications.

The difference between nodejs and vuejs The difference between nodejs and vuejs Apr 21, 2024 am 04:17 AM

Node.js is a server-side JavaScript runtime, while Vue.js is a client-side JavaScript framework for creating interactive user interfaces. Node.js is used for server-side development, such as back-end service API development and data processing, while Vue.js is used for client-side development, such as single-page applications and responsive user interfaces.

Is nodejs a backend framework? Is nodejs a backend framework? Apr 21, 2024 am 05:09 AM

Node.js can be used as a backend framework as it offers features such as high performance, scalability, cross-platform support, rich ecosystem, and ease of development.

What is the difference between npm and npm.cmd files in the nodejs installation directory? What is the difference between npm and npm.cmd files in the nodejs installation directory? Apr 21, 2024 am 05:18 AM

There are two npm-related files in the Node.js installation directory: npm and npm.cmd. The differences are as follows: different extensions: npm is an executable file, and npm.cmd is a command window shortcut. Windows users: npm.cmd can be used from the command prompt, npm can only be run from the command line. Compatibility: npm.cmd is specific to Windows systems, npm is available cross-platform. Usage recommendations: Windows users use npm.cmd, other operating systems use npm.

Is nodejs a back-end development language? Is nodejs a back-end development language? Apr 21, 2024 am 05:09 AM

Yes, Node.js is a backend development language. It is used for back-end development, including handling server-side business logic, managing database connections, and providing APIs.

What are the global variables in nodejs What are the global variables in nodejs Apr 21, 2024 am 04:54 AM

The following global variables exist in Node.js: Global object: global Core module: process, console, require Runtime environment variables: __dirname, __filename, __line, __column Constants: undefined, null, NaN, Infinity, -Infinity

How to connect nodejs to mysql database How to connect nodejs to mysql database Apr 21, 2024 am 06:13 AM

To connect to a MySQL database, you need to follow these steps: Install the mysql2 driver. Use mysql2.createConnection() to create a connection object that contains the host address, port, username, password, and database name. Use connection.query() to perform queries. Finally use connection.end() to end the connection.

Is there a big difference between nodejs and java? Is there a big difference between nodejs and java? Apr 21, 2024 am 06:12 AM

The main differences between Node.js and Java are design and features: Event-driven vs. thread-driven: Node.js is event-driven and Java is thread-driven. Single-threaded vs. multi-threaded: Node.js uses a single-threaded event loop, and Java uses a multi-threaded architecture. Runtime environment: Node.js runs on the V8 JavaScript engine, while Java runs on the JVM. Syntax: Node.js uses JavaScript syntax, while Java uses Java syntax. Purpose: Node.js is suitable for I/O-intensive tasks, while Java is suitable for large enterprise applications.

See all articles