Given you 256M of memory, how to sort 10G files (1 number per line of file)? How to search a 10G file? How to count the number of occurrences of each keyword in a 10G file
Given you 256M of memory, how to sort 10G files (1 number per line of file)? How to search a 10G file? How to count the number of occurrences of each keyword in a 10G file
Exchange time for space
The specific implementation is to load files in batches and then calculate
java? Use nio and the idea of using mapreduce
I don’t understand php
, but this question seems familiar.
Let’s talk about your ideas.
1. Implementation of sorting
This is a typical problem of single-machine external sorting. The specific method is to first sort in chunks
and then multi-channel merge
into an output file.
2. Search
If the file cannot be processed, it can only be searched through traversal.
If the files can be processed, then the files have been sorted above and you can perform binary search
.
3. Statistics
If the file cannot be processed, there is still no good way but to traverse it once.
If you have already captured the sequence, you can directly perform a binary search. Search both ends for the number of occurrences at the found position.
You can read the book "Programming Pearls", it seems to have this problem.