java - 读取大于内存的大文件怎么读？

Question

{代码...} 网上有很多该问题的解决方案，都是用分而治之的思想，提到了遍历整个文件。 那么我的问题是：如果单纯地逐行读取大文件，算是把1G文件全都加载进内存吗？或者说是读取大于内存的文件应该怎么读？

黄舟 · Answer

The memory here is like a pipe. Line-by-line reading is just to pass the 1G file through the memory. 10M represents the thickness of the pipe.
So, line-by-line reading takes 1G file into 加载进去过memory.

伊谢尔伦 · Answer

try (BufferedReader in = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = in.readLine()) != null) {
        // parse line
    }
}

No matter how big the file is, as long as the length of each line is limited, it will take a lot of time to read the entire file, but it will not take up too much memory.

伊谢尔伦 · Answer

Read in chunks, read one result set for each chunk, and finally aggregate the result set
If you are processing text, it will be better to know the number of lines

高洛峰 · Answer

linux上面有个指令叫做splitYou can quickly divide large text into small files concurrently, and then process it conveniently. This algorithm is called external sorting

怪我咯 · Answer

Memory is like scratch paper. Once you finish writing an article, turn it over. Used and unused data are thrown away.

A simple example, create a variable buff, set its size, open the file stream and fill it in. After it is filled, check the content you want. If found, it will be counted in another variable. Then clear the buff, continue to load the content again at the previously read position... Until the reading is completed, the statistics are completed.

阿神 · Answer

For different systems, an API will be provided to operate files larger than the memory, that is, the file will be treated as memory:

内存映射

mmap
CreateFileMapping