java - 读取大于内存的大文件怎么读?
PHP中文网
PHP中文网 2017-04-18 10:55:16
0
6
1004
参考:
    有一个1G大小的一个文件,内存限制大小是10M,有序返回频数最高的50个词,该怎么做?

网上有很多该问题的解决方案,都是用分而治之的思想,提到了遍历整个文件。

那么我的问题是:
如果单纯地逐行读取大文件,算是把1G文件全都加载进内存吗?
或者说是读取大于内存的文件应该怎么读?

PHP中文网
PHP中文网

认证0级讲师

reply all(6)
黄舟

The memory here is like a pipe. Line-by-line reading is just to pass the 1G file through the memory. 10M represents the thickness of the pipe.
So, line-by-line reading takes 1G file into 加载进去过memory.

伊谢尔伦
try (BufferedReader in = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = in.readLine()) != null) {
        // parse line
    }
}

No matter how big the file is, as long as the length of each line is limited, it will take a lot of time to read the entire file, but it will not take up too much memory.

伊谢尔伦

Read in chunks, read one result set for each chunk, and finally aggregate the result set
If you are processing text, it will be better to know the number of lines

小葫芦

linux上面有个指令叫做splitYou can quickly divide large text into small files concurrently, and then process it conveniently. This algorithm is called external sorting

刘奇

Memory is like scratch paper. Once you finish writing an article, turn it over. Used and unused data are thrown away.

A simple example, create a variable buff, set its size, open the file stream and fill it in. After it is filled, check the content you want. If found, it will be counted in another variable. Then clear the buff, continue to load the content again at the previously read position... Until the reading is completed, the statistics are completed.

阿神

For different systems, an API will be provided to operate files larger than the memory, that is, the file will be treated as memory:

内存映射

  • mmap

  • CreateFileMapping

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template