1.如何使用python对mongodb中的多个collection中的数据分析后做排序?
2.具体的使用场景是这样的,假设有如下model: 用户表,用户购买记录表。
假设记录表中存有用户每次买东西所花的钱,那么问题来了,如何将用户已购买东西花费金额的
累计和(假设此类统计字段有5个),做降序排列?
3.场景为我为了说明问题虚构的,事实上有很多统计字段,假设用户表中有100w条记录,记录表100w条数据,服务器4核8线程,能否做到每20条数据的等待时间不超过3s?
4.假设在统计完每个用户的所有数据后用sorted进行排序,是否效率真的会很低?
Enable mongodb's index for the corresponding fields you need to filter (mongodb supports multiple indexes under one collection), that is, the index. Since it uses hashtable, it should be much faster, and you can use mongodb's own api for sorting,
100W
的情形没遇到过, 但是1~10W
的规模记得好像是500ms
As a comparison, without opening the index, life is stuck and I can’t take care of myselfIn addition, if the data you need to count is very important and the call frequency is high, it is recommended to create a separate collection, call the queue cache regularly, and trade space for time. This collection can have the following fields, user ID, and the past 3 hours The total number of purchases in the past 12 hours, the past 24 hours, the past day, the past month, the total purchase volume in the past, etc. The disadvantage of this is that it wastes some space, and it cannot reflect the data in real time, but the advantages are obvious , if you want to query the amount of a user’s chops, you can simply query, with millisecond-level response
The above is just one family’s opinion and is for reference only
You can load all collection data into memory and then process it.
Mongodb is not good at processing data from multiple collections, so it is best to aggregate them all together when designing the data itself.
Create an index for the query of a single collection. The order of query usage is: basic query->aggregation->mapreduce. The query method becomes more and more flexible from left to right, and the query efficiency is getting lower and lower.
Querying multiple collections needs to be implemented by yourself, querying from each collection separately, and processing multiple query results.
For those with particularly high timeliness requirements, use an intermediate cache layer and design an update strategy.