Table of Contents
runCommand implementation
Home Backend Development PHP Tutorial Use of MapReduce in MongoDB

Use of MapReduce in MongoDB

Dec 08, 2017 pm 02:26 PM
mapreduce mongodb use

Friends who have played Hadoop should be familiar with MapReduce. MapReduce is powerful and flexible. It can split a large problem into multiple small problems and send each small problem to different machines for processing. All machines After all calculations are completed, the calculation results are combined into a complete solution. This is called distributed computing. In this article, we will take a look at the use of MapReduce in MongoDB.

mapReduce

MapReduce in MongoDB can be used to implement more complex aggregation commands. Using MapReduce mainly implements two functions: map function and reduce function. The map function is used to generate a sequence of key-value pairs. The result of the map function is used as a parameter of the reduce function. Further statistics are done in the reduce function. For example, my data set is as follows:

{"_id" : ObjectId("59fa71d71fd59c3b2cd908d7"),"name" : "鲁迅","book" : "呐喊","price" : 38.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908d8"),"name" : "曹雪芹","book" : "红楼梦","price" : 22.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908d9"),"name" : "钱钟书","book" : "宋诗选注","price" : 99.0,"publisher" : "人民文学出版社"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908da"),"name" : "钱钟书","book" : "谈艺录","price" : 66.0,"publisher" : "三联书店"}
{"_id" : ObjectId("59fa71d71fd59c3b2cd908db"),"name" : "鲁迅","book" : "彷徨","price" : 55.0,"publisher" : "花城出版社"}
Copy after login

If I want to query each author The total price of the books published, the operation is as follows:

var map=function(){emit(this.name,this.price)}
var reduce=function(key,value){return Array.sum(value)}
var options={out:"totalPrice"}
db.sang_books.mapReduce(map,reduce,options);
db.totalPrice.find()
Copy after login

emit function is mainly used to implement grouping and receives two parameters. The first parameter represents the grouping field, and the second parameter represents the data to be counted. Reduce performs specific data processing operations and receives two parameters, corresponding to the two parameters of the emit method. Here, the sum function in Array is used to perform self-processing on the price field. Options defines a set for outputting the results. At that time, we The data will be queried in this collection. By default, this collection will be retained even after the database is restarted, and the data in the collection will be retained. The query results are as follows:

{
    "_id" : "曹雪芹",
    "value" : 22.0
}
{
    "_id" : "钱钟书",
    "value" : 165.0
}
{
    "_id" : "鲁迅",
    "value" : 93.0
}
Copy after login

For another example, I want to query how many books each author has published, as follows:

var map=function(){emit(this.name,1)}
var reduce=function(key,value){return Array.sum(value)}
var options={out:"bookNum"}
db.sang_books.mapReduce(map,reduce,options);
db.bookNum.find()
Copy after login

The query results are as follows:

{
    "_id" : "曹雪芹",
    "value" : 1.0
}
{
    "_id" : "钱钟书",
    "value" : 2.0
}
{
    "_id" : "鲁迅",
    "value" : 2.0
}
Copy after login

Put each author’s The books are listed as follows:

var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
var options={out:"books"}
db.sang_books.mapReduce(map,reduce,options);
db.books.find()
Copy after login

The results are as follows:

{
    "_id" : "曹雪芹",
    "value" : "红楼梦"
}
{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "呐喊,彷徨"
}
Copy after login

For example, if you query the books that each person sells for more than ¥40:

var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
var options={query:{price:{$gt:40}},out:"books"}
db.sang_books.mapReduce(map,reduce,options);
db.books.find()
Copy after login

query means to check the found set Filter again.

The results are as follows:

{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "彷徨"
}
Copy after login

runCommand implementation

We can also use the runCommand command to execute MapReduce. The format is as follows:

db.runCommand(
               {
                 mapReduce: <collection>,
                 map: <function>,
                 reduce: <function>,
                 finalize: <function>,
                 out: <output>,
                 query: <document>,
                 sort: <document>,
                 limit: <number>,
                 scope: <document>,
                 jsMode: <boolean>,
                 verbose: <boolean>,
                 bypassDocumentValidation: <boolean>,
                 collation: <document>
               }
             )
Copy after login

The meaning is as follows:

##reduce reduce functionfinalizeFinal processing functionoutOutput Collection of ##query##sortSort the resultslimitThe number of results returnedscopeSet the parameter value, the value set here is in map , visible in the reduce and finalize functionsjsMode Whether to convert the intermediate data of map execution from javascript object to BSON object, the default is falseverboseWhether to display detailed time statisticsbypassDocumentValidationWhether to bypass document validationcollationSome other collationsThe following operation means performing a MapReduce operation and limiting the number of items returned to the statistical collection, limiting the return After the number of items, the statistical operation is performed, as follows:
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",limit:4,verbose:true})
db.books.find()
Copy after login
Parameter Meaning
mapReduce Represents the collection to be operated
map map function
Filter the results
The execution results are as follows:

{
    "_id" : "曹雪芹",
    "value" : "红楼梦"
}
{
    "_id" : "钱钟书",
    "value" : "宋诗选注,谈艺录"
}
{
    "_id" : "鲁迅",
    "value" : "呐喊"
}
Copy after login
My friends saw that one of Lu Xun’s books was missing, because limit first limits the collection of returned items. number, and then perform statistical operations.

The finalize operation represents the final processing function, as follows:

var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue; return obj}
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1})
db.books.find()
Copy after login
f1 The first parameter key represents the first parameter in emit, and the second parameter represents the execution result of reduce. We can This result is reprocessed in f1, and the result is as follows:

{
    "_id" : "曹雪芹",
    "value" : {
        "author" : "曹雪芹",
        "books" : "红楼梦"
    }
}
{
    "_id" : "钱钟书",
    "value" : {
        "author" : "钱钟书",
        "books" : "宋诗选注,谈艺录"
    }
}
{
    "_id" : "鲁迅",
    "value" : {
        "author" : "鲁迅",
        "books" : "呐喊,彷徨"
    }
}
Copy after login
scope can be used to define a variable that is visible in map, reduce, and finalize, as follows:

var f1 = function(key,reduceValue){var obj={};obj.author=key;obj.books=reduceValue;obj.sang=sang; return obj}
var map=function(){emit(this.name,this.book)}
var reduce=function(key,value){return value.join(',--'+sang+'--,')}
db.runCommand({mapreduce:'sang_books',map,reduce,out:"books",finalize:f1,scope:{sang:"haha"}})
db.books.find()
Copy after login
The execution result is as follows :

{
    "_id" : "曹雪芹",
    "value" : {
        "author" : "曹雪芹",
        "books" : "红楼梦",
        "sang" : "haha"
    }
}
{
    "_id" : "钱钟书",
    "value" : {
        "author" : "钱钟书",
        "books" : "宋诗选注,--haha--,谈艺录",
        "sang" : "haha"
    }
}
{
    "_id" : "鲁迅",
    "value" : {
        "author" : "鲁迅",
        "books" : "呐喊,--haha--,彷徨",
        "sang" : "haha"
    }
}
Copy after login
I hope you will gain something from this article.

Related recommendations:

MongoDB mapreduce usage and PHP sample code

How to increase MongoDB MapReduce speed by 20 times

Implementing MapReduce in Oracle database

The above is the detailed content of Use of MapReduce in MongoDB. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What software is crystaldiskmark? -How to use crystaldiskmark? What software is crystaldiskmark? -How to use crystaldiskmark? Mar 18, 2024 pm 02:58 PM

CrystalDiskMark is a small HDD benchmark tool for hard drives that quickly measures sequential and random read/write speeds. Next, let the editor introduce CrystalDiskMark to you and how to use crystaldiskmark~ 1. Introduction to CrystalDiskMark CrystalDiskMark is a widely used disk performance testing tool used to evaluate the read and write speed and performance of mechanical hard drives and solid-state drives (SSD). Random I/O performance. It is a free Windows application and provides a user-friendly interface and various test modes to evaluate different aspects of hard drive performance and is widely used in hardware reviews

Which version is generally used for mongodb? Which version is generally used for mongodb? Apr 07, 2024 pm 05:48 PM

It is recommended to use the latest version of MongoDB (currently 5.0) as it provides the latest features and improvements. When selecting a version, you need to consider functional requirements, compatibility, stability, and community support. For example, the latest version has features such as transactions and aggregation pipeline optimization. Make sure the version is compatible with the application. For production environments, choose the long-term support version. The latest version has more active community support.

How to download foobar2000? -How to use foobar2000 How to download foobar2000? -How to use foobar2000 Mar 18, 2024 am 10:58 AM

foobar2000 is a software that can listen to music resources at any time. It brings you all kinds of music with lossless sound quality. The enhanced version of the music player allows you to get a more comprehensive and comfortable music experience. Its design concept is to play the advanced audio on the computer The device is transplanted to mobile phones to provide a more convenient and efficient music playback experience. The interface design is simple, clear and easy to use. It adopts a minimalist design style without too many decorations and cumbersome operations to get started quickly. It also supports a variety of skins and Theme, personalize settings according to your own preferences, and create an exclusive music player that supports the playback of multiple audio formats. It also supports the audio gain function to adjust the volume according to your own hearing conditions to avoid hearing damage caused by excessive volume. Next, let me help you

How to use Baidu Netdisk app How to use Baidu Netdisk app Mar 27, 2024 pm 06:46 PM

Cloud storage has become an indispensable part of our daily life and work nowadays. As one of the leading cloud storage services in China, Baidu Netdisk has won the favor of a large number of users with its powerful storage functions, efficient transmission speed and convenient operation experience. And whether you want to back up important files, share information, watch videos online, or listen to music, Baidu Cloud Disk can meet your needs. However, many users may not understand the specific use method of Baidu Netdisk app, so this tutorial will introduce in detail how to use Baidu Netdisk app. Users who are still confused can follow this article to learn more. ! How to use Baidu Cloud Network Disk: 1. Installation First, when downloading and installing Baidu Cloud software, please select the custom installation option.

The difference between nodejs and vuejs The difference between nodejs and vuejs Apr 21, 2024 am 04:17 AM

Node.js is a server-side JavaScript runtime, while Vue.js is a client-side JavaScript framework for creating interactive user interfaces. Node.js is used for server-side development, such as back-end service API development and data processing, while Vue.js is used for client-side development, such as single-page applications and responsive user interfaces.

How to use NetEase Mailbox Master How to use NetEase Mailbox Master Mar 27, 2024 pm 05:32 PM

NetEase Mailbox, as an email address widely used by Chinese netizens, has always won the trust of users with its stable and efficient services. NetEase Mailbox Master is an email software specially created for mobile phone users. It greatly simplifies the process of sending and receiving emails and makes our email processing more convenient. So how to use NetEase Mailbox Master, and what specific functions it has. Below, the editor of this site will give you a detailed introduction, hoping to help you! First, you can search and download the NetEase Mailbox Master app in the mobile app store. Search for "NetEase Mailbox Master" in App Store or Baidu Mobile Assistant, and then follow the prompts to install it. After the download and installation is completed, we open the NetEase email account and log in. The login interface is as shown below

BTCC tutorial: How to bind and use MetaMask wallet on BTCC exchange? BTCC tutorial: How to bind and use MetaMask wallet on BTCC exchange? Apr 26, 2024 am 09:40 AM

MetaMask (also called Little Fox Wallet in Chinese) is a free and well-received encryption wallet software. Currently, BTCC supports binding to the MetaMask wallet. After binding, you can use the MetaMask wallet to quickly log in, store value, buy coins, etc., and you can also get 20 USDT trial bonus for the first time binding. In the BTCCMetaMask wallet tutorial, we will introduce in detail how to register and use MetaMask, and how to bind and use the Little Fox wallet in BTCC. What is MetaMask wallet? With over 30 million users, MetaMask Little Fox Wallet is one of the most popular cryptocurrency wallets today. It is free to use and can be installed on the network as an extension

Where is the database created by mongodb? Where is the database created by mongodb? Apr 07, 2024 pm 05:39 PM

The data of the MongoDB database is stored in the specified data directory, which can be located in the local file system, network file system or cloud storage. The specific location is as follows: Local file system: The default path is Linux/macOS:/data/db, Windows: C:\data\db. Network file system: The path depends on the file system. Cloud Storage: The path is determined by the cloud storage provider.

See all articles