javascript - Find the median of 10G integers, the memory limit is 2G.

Question

Tencent has this school recruitment question: For a PC with only 2G of memory, find the median in a file containing 10G of integers and write an algorithm. There are several solutions, such as bucket sorting and radix sorting, but I found a method that uses a heap to solve it. I don't quite understand it. The original words are like this: first...

PHPz · Answer

Heap is a data structure, which is a complete binary tree (so no pointer is needed to save its structure). Each node is no larger than or no smaller than its child nodes, corresponding to the large top heap and the small top heap respectively. For example, in a small top heap, the root element is the smallest. So it can be used as O(1)的复杂度查找其中的最大或是最小的元素。其删除和插入元素的复杂度，最坏情况下都是O(log(N)).

This passage means that by first creating a small top heap that can accommodate (1G+1) data, you can quickly find the smallest element in it. Then traverse all 10G of data and insert it: if the heap is full after insertion (there are 1G+1 data in the heap). Just delete the smallest one. . . .
The final result is that the data in the heap is the largest 1G data, and the top of the heap is the smallest one, that is, if sorted in descending order, the position of this data is the 1G position, for example, this number is x.
Then, continue the above process again, but the difference is that only elements less than x are inserted into the heap, and elements greater than or equal to x are ignored. That is, after ignoring the 1G data just found, find the largest 1G. In this way, the top of the final heap is the data at the 2G position after sorting. (But be careful when dealing with duplicate data). . . . And so on to find the median.

漂亮男人 · Answer

I have a question, can’t this kind of question be done using bitmaps? . .
1. Traverse these 10G numbers, mark which ones have appeared in the bitmap, and count how many different numbers have appeared, record it as count
2. Traverse the bitmap, find the count/2th bit, which is this number. . .

If the integer is 4byte like 0 ~ 2^32-1. Then to save its value range, you only need a bitmap of 512M memory. . . . ?

ringa_lee · Answer

I still have some divergent questions about this method. If the masters find it interesting, you can take a look:
Since it is divided into 10 heaps, what is the common boundary value of each heap?

If divided into ten piles, how many judgments does quick sort require in the worst case?

Is it possible to reduce the amount of calculation?

@zonxin