Friends who have played Hadoop should be familiar with MapReduce. MapReduce is powerful and flexible. It can split a large problem into multiple small problems and send each small problem to different machines for processing. All machines After all calculations are completed, the calculation results are combined into a complete solution. This is called distributed computing. In this article, we will take a look at the use of MapReduce in MongoDB.
mapReduce
MapReduce in MongoDB can be used to implement more complex aggregation commands. Using MapReduce mainly implements two functions: map function and reduce function. The map function is used to generate a sequence of key-value pairs. The result of the map function is used as a parameter of the reduce function. Further statistics are done in the reduce function. For example, my data set is as follows:
{"_id" : ObjectId("59fa71d71fd59c3b2cd908d7"),"name" : "鲁迅","book" : "呐喊","price" : 38.0,"publisher" : "人民文学出版社"} {"_id" : ObjectId("59fa71d71fd59c3b2cd908d8"),"name" : "曹雪芹","book" : "红楼梦","price" : 22.0,"publisher" : "人民文学出版社"} {"_id" : ObjectId("59fa71d71fd59c3b2cd908d9"),"name" : "钱钟书","book" : "宋诗选注","price" : 99.0,"publisher" : "人民文学出版社"} {"_id" : ObjectId("59fa71d71fd59c3b2cd908da"),"name" : "钱钟书","book" : "谈艺录","price" : 66.0,"publisher" : "三联书店"} {"_id" : ObjectId("59fa71d71fd59c3b2cd908db"),"name" : "鲁迅","book" : "彷徨","price" : 55.0,"publisher" : "花城出版社"}
If I want to query each author The total price of the books published, the operation is as follows:
var map=function(){emit(this.name,this.price)} var reduce=function(key,value){return Array.sum(value)} var options={out:"totalPrice"} db.sang_books.mapReduce(map,reduce,options); db.totalPrice.find()
emit function is mainly used to implement grouping and receives two parameters. The first parameter represents the grouping field, and the second parameter represents the data to be counted. Reduce performs specific data processing operations and receives two parameters, corresponding to the two parameters of the emit method. Here, the sum function in Array is used to perform self-processing on the price field. Options defines a set for outputting the results. At that time, we The data will be queried in this collection. By default, this collection will be retained even after the database is restarted, and the data in the collection will be retained. The query results are as follows:
{ "_id" : "曹雪芹", "value" : 22.0 } { "_id" : "钱钟书", "value" : 165.0 } { "_id" : "鲁迅", "value" : 93.0 }
For another example, I want to query how many books each author has published, as follows:
var map=function(){emit(this.name,1)} var reduce=function(key,value){return Array.sum(value)} var options={out:"bookNum"} db.sang_books.mapReduce(map,reduce,options); db.bookNum.find()
The query results are as follows:
{ "_id" : "曹雪芹", "value" : 1.0 } { "_id" : "钱钟书", "value" : 2.0 } { "_id" : "鲁迅", "value" : 2.0 }
Put each author’s The books are listed as follows:
var map=function(){emit(this.name,this.book)} var reduce=function(key,value){return value.join(',')} var options={out:"books"} db.sang_books.mapReduce(map,reduce,options); db.books.find()
The results are as follows:
{ "_id" : "曹雪芹", "value" : "红楼梦" } { "_id" : "钱钟书", "value" : "宋诗选注,谈艺录" } { "_id" : "鲁迅", "value" : "呐喊,彷徨" }
For example, if you query the books that each person sells for more than ¥40:
var map=function(){emit(this.name,this.book)} var reduce=function(key,value){return value.join(',')} var options={query:{price:{$gt:40}},out:"books"} db.sang_books.mapReduce(map,reduce,options); db.books.find()
query means to check the found set Filter again.
The results are as follows:
{ "_id" : "钱钟书", "value" : "宋诗选注,谈艺录" } { "_id" : "鲁迅", "value" : "彷徨" }
We can also use the runCommand command to execute MapReduce. The format is as follows:
db.runCommand( { mapReduce: <collection>, map: <function>, reduce: <function>, finalize: <function>, out: <output>, query: <document>, sort: <document>, limit: <number>, scope: <document>, jsMode: <boolean>, verbose: <boolean>, bypassDocumentValidation: <boolean>, collation: <document> } )
The meaning is as follows:
Parameter | Meaning |
---|---|
mapReduce | Represents the collection to be operated |
map | map function |
reduce function | |
Final processing function | |
Output Collection of | |
Filter the results | ##sort |
limit | |
scope | |
jsMode | |
verbose | |
bypassDocumentValidation | |
collation | |
{ "_id" : "曹雪芹", "value" : "红楼梦" } { "_id" : "钱钟书", "value" : "宋诗选注,谈艺录" } { "_id" : "鲁迅", "value" : "呐喊" }
var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue; return obj} var map=function(){emit(this.name,this.book)} var reduce=function(key,value){return value.join(',')} db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1}) db.books.find()
{ "_id" : "曹雪芹", "value" : { "author" : "曹雪芹", "books" : "红楼梦" } } { "_id" : "钱钟书", "value" : { "author" : "钱钟书", "books" : "宋诗选注,谈艺录" } } { "_id" : "鲁迅", "value" : { "author" : "鲁迅", "books" : "呐喊,彷徨" } }
var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue;obj.sang=sang; return obj} var map=function(){emit(this.name,this.book)} var reduce=function(key,value){return value.join(',--'+sang+'--,')} db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1,scope:{sang:"haha"}}) db.books.find()
{ "_id" : "曹雪芹", "value" : { "author" : "曹雪芹", "books" : "红楼梦", "sang" : "haha" } } { "_id" : "钱钟书", "value" : { "author" : "钱钟书", "books" : "宋诗选注,--haha--,谈艺录", "sang" : "haha" } } { "_id" : "鲁迅", "value" : { "author" : "鲁迅", "books" : "呐喊,--haha--,彷徨", "sang" : "haha" } }
MongoDB mapreduce usage and PHP sample code
How to increase MongoDB MapReduce speed by 20 times
Implementing MapReduce in Oracle database
The above is the detailed content of Use of MapReduce in MongoDB. For more information, please follow other related articles on the PHP Chinese website!