Thirty thousand pieces of data, each piece of data contains only one random number {"digit": random number}
Requirement: Count the number that appears most often
Database table table
def main():
digits = []
for d in table.find():
n = d['digit']
digits.append(n)
dig = set(digits)
news = []
i = 0
for d in dig:
c = table.find({"digit": d}).count()
zz = (d, c)
news.append(zz)
print(i)
i += 1
if __name__ == '__main__':
start = time.time()
main()
print('Cost: {}'.format(time.time() - start))
It takes about five or six minutes to run once. Using multi-threading to run 100 is not much faster, and the fan is very loud...
What is the correct posture?
The correct posture is to use aggregation.
Users of $group can refer to the documentation.
It should be noted that the possibility of such a demand appearing in reality is not high. It is estimated that this is a practice question for you. In fact, even if Aggregatoin is used, it is still necessary to traverse all the data in the entire collection to find the most frequent number. Therefore, when the total number of records in the collection is relatively large, such a full table traversal operation cannot be fast. This kind of search method is usually only available in OLAP scenarios, and OLAP usually does not have high speed requirements. Therefore, only from a theoretical discussion, the aggregation framework should be used, but the real needs still require detailed analysis.