MongoDB 聚合
MongoDB除了基本的查询功能,还提供了很多强大的聚合工具,其中简单的可计算集合中的文档个数, 复杂的可利用MapReduce做复杂数据分析. 1.count count返回集合中的文档数量 db.refactor.count() 不管集合有多大,都能很快的返回文档数量. 可以传递查询,MongoDB会
MongoDB除了基本的查询功能,还提供了很多强大的聚合工具,其中简单的可计算集合中的文档个数,
复杂的可利用MapReduce做复杂数据分析.
1.count
count返回集合中的文档数量
db.refactor.count()
不管集合有多大,都能很快的返回文档数量.
可以传递查询,MongoDB会计算查询结果的数量
db.refactor.count({"username":"refactor"})
但是增加查询条件会使count变慢.
2.distinct
distinct用来找出给定键的所有不同值.使用时必须指定集合和键.
如:
db.runCommand({"distinct":"refactor","key":"username"})
3.group
group先选定分组所依据的键,MongoDB将会将集合依据选定键值的不同分成若干组.然后可以通过聚合每一组内的文档,
产生一个结果文档.
如:
db.runCommand(
{
"group":
{
"ns":"refactor",
"key":{"username":true},
"initial":{"count":0},
"$reduce":function(doc,prev)
{
prev.count++;
},
"condition":{"age":{"$gt":40}}
}
}
)
"ns":"refactor",
指定要进行分组的集合
"key":{"username":true},
指定文档分组的依据,这里是username键,所有username键的值相等的被划分到一组,true为返回键username的值
"initial":{"count":0},
每一组reduce函数调用的初始个数.每一组的所有成员都会使用这个累加器.
"$reduce":function(doc,prev){...}
每个文档都对应的调用一次.系统会传递两个参数:当前文档和累加器文档.
"condition":{"age":{"$gt":40}}
这个age的值大于40的条件
4.使用完成器
完成器用于精简从数据库传到用户的数据.group命令的输出一定要能放在单个数据库相应中.
"finalize"附带一个函数,在数组结果传递到客户端之前被调用一次.
db.runCommand(
{
"group":
{
"ns":"refactor",
"key":{"username":true},
"initial":{"count":0},
"$reduce":function(doc,prev)
{
prev.count++;
},
"finalize":function(doc)
{
doc.num=doc.count;
delete doc.count;
}
}
}
)
finalize能修改传递的参数也能返回新值.
5.将数组作为键使用
有些时候分组所依据的条件很复杂,不仅是一个键.比如要使用group计算每个类别有多篇博客文章.由于有很多作者,
给文章分类时可能不规律的使用了大小写.所以,如果要是按类别名来分组,最后"MongoDB"和"mongodb"就是不同的组.
为了消除这种大小写的影响,就要定义一个函数来确定文档所依据的键.
定义分组要用到$keyf
db.runCommand(
{
"group":
{
"ns":"refactor",
"$keyf":function(doc){return {"username":doc.username.toLowerCase()}},
"initial":{"count":0},
"$reduce":function(doc,prev)
{
prev.count++;
}
}
}
)
6.MapReduce
count,distinct,group能做的事情MapReduce都能做.它是一个可以轻松并行化到多个服务器的聚合方法.它会
拆分问题,再将各个部分发送到不同机器上,让每台机器完成一部分.当所有机器都完成时候,再把结果汇集起来形成
最终完整的结果.
MapReduce需要几个步骤:
1.映射,将操作映射到集合中的每个文档.这个操作要么什么都不做,要么 产生一个键和n个值.
2.洗牌,按照键分组,并将产生的键值组成列表放到对应键中.
3.化简,把列表中的值 化简 成一个单值,这个值被返回.
4.重新洗牌,直到每个键的列表只有一个值为止,这个值就是最终结果.
MapReduce的速度比group慢,group也很慢.在应用程序中,最好不要用MapReduce,可以在后台运行MapReduce
创建一个保存结果的集合,可以对这个集合进行实时查询.
找出集合中的所有键
MongoDB没有模式,所以并不知晓每个文档有多少个键.通常找到集合的所有键的做好方式是用MapReduce.
在映射阶段,想得到文档中的每个键.map函数使用emit 返回要处理的值.emit会给MapReduce一个键和一个值.
这里用emit将文档某个键的记数(count)返回({count:1}).我们为每个键单独记数,所以为文档中的每一个键调用一次emit,
this是当前文档的引用:
map=function(){
for(var key in this)
{
emit(key,{count:1})
}
};
这样返回了许许多多的{count:1}文档,每一个都与集合中的一个键相关.这种有一个或多个{count:1}文档组成的数组,
会传递给reduce函数.reduce函数有两个参数,一个是key,也就是emit返回的第一个值,另一个参数是数组,由一个或者多个
对应键的{count:1}文档组成.
reduce=function(key,emits){
total=0;
for(var i in emits){
total+=emits[i].count;
}
return {count:total};
}
reduce要能被反复被调用,不论是映射环节还是前一个化简环节.reduce返回的文档必须能作为reduce的
第二个参数的一个元素.如x键映射到了3个文档{"count":1,id:1},{"count":1,id:2},{"count":1,id:3}
其中id键用于区别.MongoDB可能这样调用reduce:
>r1=reduce("x",[{"count":1,id:1},{"count":1,id:2}])
{count:2}
>r2=reduce("x",[{"count":1,id:3}])
{count:1}
>reduce("x",[r1,r2])
{count:3}
reduce应该能处理emit文档和其他reduce结果的各种集合.
如:
mr=db.runCommand(
{
"mapreduce":"refactor",
"map":map,
"reduce":reduce,
"out":{inline:1}
}
)
或:
db.refactor.mapReduce(map,reduce,{out:{inline:1}})

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

It is recommended to use the latest version of MongoDB (currently 5.0) as it provides the latest features and improvements. When selecting a version, you need to consider functional requirements, compatibility, stability, and community support. For example, the latest version has features such as transactions and aggregation pipeline optimization. Make sure the version is compatible with the application. For production environments, choose the long-term support version. The latest version has more active community support.

Node.js is a server-side JavaScript runtime, while Vue.js is a client-side JavaScript framework for creating interactive user interfaces. Node.js is used for server-side development, such as back-end service API development and data processing, while Vue.js is used for client-side development, such as single-page applications and responsive user interfaces.

With the development of the Internet, people's lives are becoming more and more digital, and the demand for personalization is becoming stronger and stronger. In this era of information explosion, users are often faced with massive amounts of information and have no choice, so the importance of real-time recommendation systems has become increasingly prominent. This article will share the experience of using MongoDB to implement a real-time recommendation system, hoping to provide some inspiration and help to developers. 1. Introduction to MongoDB MongoDB is an open source NoSQL database known for its high performance, easy scalability and flexible data model. Compared to biography

The data of the MongoDB database is stored in the specified data directory, which can be located in the local file system, network file system or cloud storage. The specific location is as follows: Local file system: The default path is Linux/macOS:/data/db, Windows: C:\data\db. Network file system: The path depends on the file system. Cloud Storage: The path is determined by the cloud storage provider.

The MongoDB database is known for its flexibility, scalability, and high performance. Its advantages include: a document data model that allows data to be stored in a flexible and unstructured way. Horizontal scalability to multiple servers via sharding. Query flexibility, supporting complex queries and aggregation operations. Data replication and fault tolerance ensure data redundancy and high availability. JSON support for easy integration with front-end applications. High performance for fast response even when processing large amounts of data. Open source, customizable and free to use.

MongoDB is a document-oriented, distributed database system used to store and manage large amounts of structured and unstructured data. Its core concepts include document storage and distribution, and its main features include dynamic schema, indexing, aggregation, map-reduce and replication. It is widely used in content management systems, e-commerce platforms, social media websites, IoT applications, and mobile application development.

The MongoDB database file is located in the MongoDB data directory, which is /data/db by default, which contains .bson (document data), ns (collection information), journal (write operation records), wiredTiger (data when using the WiredTiger storage engine ) and config (database configuration information) and other files.

On Linux/macOS: Create the data directory and start the "mongod" service. On Windows: Create the data directory and start the MongoDB service from Service Manager. In Docker: Run the "docker run" command. On other platforms: Please consult the MongoDB documentation. Verification method: Run the "mongo" command to connect and view the server version.
