Efficient Indexing in MongoDB 2.6
By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi
By Osmar Olivo, Product Manager at MongoDB
One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.
MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.
Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.
In the following sections we will take a deep dive into index intersection and how it can be applied to applications.
An Example - The Phone Book
Let’s take the example of a phone book with the following schema.
{ FirstName LastName Phone_Number Address }
If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?
db.phonebook.find({ FirstName : “John”, LastName : “Smith” })
I could use an individual index on FirstName and search for all of the “Johns”.
This would look something like ensureIndex( { FirstName : 1 } )
We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.
> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 1000000, "nscanned" : 1000000, "nscannedObjectsAllPlans" : 1000101, "nscannedAllPlans" : 1000101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 2043, "indexBounds" : { "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
How about creating an individual index on LastName?
This would look something like ensureIndex( { LastName : 1 } )
Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 400000, "nscanned" : 400000, "nscannedObjectsAllPlans" : 400101, "nscannedAllPlans" : 400101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 1, "nChunkSkips" : 0, "millis" : 852, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?
In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.
ensureIndex( { LastName : true, FirstName : 1 } ) db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1_FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 200000, "nscanned" : 200000, "nscannedObjectsAllPlans" : 200000, "nscannedAllPlans" : 200000, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 370, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ], "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.
Beyond Compound Indexes
The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.
Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.
{ Fname LName SSN Age Blood_Type Conditions : [] Medications : [ ] ... ... }
Some sample searches that a doctor/nurse may run on this system would look something like the below.
Find me a Patient with Blood_Type = O under the age of 50
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
Find me all patients over the age of 60 on Medication X
db.patients.find( { Medications : “X” , Age : { $gt : 60} })
Find me all Diabetic patients on medication Y
db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )
With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.
To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.
For example, looking at the earlier example:
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age
Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age
Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.
As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.
Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:
Find me all patients with Diabetes & High Blood Pressure
db.patients.find( { Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ] } } )
In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.
Do We Still Need Compound Indexes?
To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.
- Download MongoDB 2.6 Today
- Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper
原文地址:Efficient Indexing in MongoDB 2.6, 感谢原作者分享。

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

AI Hentai Generator
免费生成ai无尽的。

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

推荐使用 MongoDB 最新版本(当前为 5.0),因为它提供了最新特性和改进。选择版本时,需考虑功能需求、兼容性、稳定性和社区支持,例如:最新版本具有事务、聚合管道优化等特性。确保版本与应用程序兼容。生产环境选择长期支持版本。最新版本有更活跃的社区支持。

Node.js 是一种服务器端 JavaScript 运行时,而 Vue.js 是一个客户端 JavaScript 框架,用于创建交互式用户界面。Node.js 用于服务器端开发,如后端服务 API 开发和数据处理,而 Vue.js 用于客户端开发,如单页面应用程序和响应式用户界面。

随着互联网的发展,人们的生活越来越数字化,个性化需求也越来越强烈。在这个信息爆炸的时代,用户往往面对海量的信息无从选择,所以实时推荐系统的重要性愈发凸显出来。本文将分享利用MongoDB实现实时推荐系统的经验,希望能为开发者们提供一些启发和帮助。一、MongoDB简介MongoDB是一个开源的NoSQL数据库,它以高性能、易扩展和灵活的数据模型而闻名。相比传

MongoDB 数据库的数据存储在指定的数据目录中,该目录可以位于本地文件系统、网络文件系统或云存储中,具体位置如下:本地文件系统:默认路径为 Linux/macOS:/data/db,Windows:C:\data\db。网络文件系统:路径取决于文件系统。云存储:路径由云存储提供商决定。

MongoDB 数据库以其灵活、可扩展和高性能而闻名。它的优势包括:文档数据模型,允许以灵活和非结构化的方式存储数据。水平可扩展性,可通过分片扩展到多个服务器。查询灵活性,支持复杂的查询和聚合操作。数据复制和容错,确保数据的冗余和高可用性。JSON 支持,便于与前端应用程序集成。高性能,即使处理大量数据也能实现快速响应。开源,可定制且免费使用。

MongoDB是一款面向文档的、分布式数据库系统,用于存储和管理大量结构化和非结构化数据。其核心概念包括文档存储和分布式,主要特性有动态模式、索引、聚集、映射-归约和复制。它广泛应用于内容管理系统、电子商务平台、社交媒体网站、物联网应用和移动应用开发等领域。

MongoDB 数据库文件位于 MongoDB 数据目录中,默认情况下为 /data/db,其中包含 .bson(文档数据)、ns(集合信息)、journal(写入操作记录)、wiredTiger(使用 WiredTiger 存储引擎时的数据)和 config(数据库配置信息)等文件。

在 Linux/macOS 上:创建数据目录并启动 "mongod" 服务。在 Windows 上:创建数据目录并从服务管理器中启动 MongoDB 服务。在 Docker 中:运行 "docker run" 命令。在其他平台上:请查阅 MongoDB 文档。验证方式:运行 "mongo" 命令以连接并查看服务器版本。
