首頁 資料庫 mysql教程 Efficient Indexing in MongoDB 2.6

Efficient Indexing in MongoDB 2.6

Jun 07, 2016 pm 04:35 PM
mongodb

By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi

By Osmar Olivo, Product Manager at MongoDB

One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.

MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.

Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.

In the following sections we will take a deep dive into index intersection and how it can be applied to applications.

An Example - The Phone Book

Let’s take the example of a phone book with the following schema.

{
    FirstName
    LastName
    Phone_Number
    Address
}
登入後複製

If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?

db.phonebook.find({ FirstName : “John”, LastName : “Smith” })
登入後複製

I could use an individual index on FirstName and search for all of the “Johns”.

This would look something like ensureIndex( { FirstName : 1 } )

We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.

> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 1000000,
    "nscanned" : 1000000,
    "nscannedObjectsAllPlans" : 1000101,
    "nscannedAllPlans" : 1000101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 2,
    "nChunkSkips" : 0,
    "millis" : 2043,
    "indexBounds" : {
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登入後複製

How about creating an individual index on LastName?

This would look something like ensureIndex( { LastName : 1 } )

Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?

db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 400000,
    "nscanned" : 400000,
    "nscannedObjectsAllPlans" : 400101,
    "nscannedAllPlans" : 400101,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 1,
    "nChunkSkips" : 0,
    "millis" : 852,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登入後複製

So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?

In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.

ensureIndex( {  LastName : true, FirstName : 1  } ) 
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain()
{
    "cursor" : "BtreeCursor LastName_1_FirstName_1",
    "isMultiKey" : false,
    "n" : 200000,
    "nscannedObjects" : 200000,
    "nscanned" : 200000,
    "nscannedObjectsAllPlans" : 200000,
    "nscannedAllPlans" : 200000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 370,
    "indexBounds" : {
        "LastName" : [
            [
                "Smith",
                "Smith"
            ]
        ],
        "FirstName" : [
            [
                "John",
                "John"
            ]
        ]
    },
    "server" : "Oz-Olivo-MacBook-Pro.local:27017"
}
登入後複製

Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.

Beyond Compound Indexes

The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.

Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.

{
      Fname
      LName
      SSN
      Age
      Blood_Type
      Conditions : [] 
      Medications : [ ]
      ...
      ...
}
登入後複製

Some sample searches that a doctor/nurse may run on this system would look something like the below.

Find me a Patient with Blood_Type = O under the age of 50

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )
登入後複製
登入後複製

Find me all patients over the age of 60 on Medication X

db.patients.find( { Medications : “X” , Age : { $gt : 60} })
登入後複製

Find me all Diabetic patients on medication Y

db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )
登入後複製

With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.

To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.

For example, looking at the earlier example:

db.patients.find( {   Blood_Type : “O”,  Age : {   $lt : 50  }     } )
登入後複製
登入後複製

It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age

Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age

Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.

As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:

Find me all patients with Diabetes & High Blood Pressure

db.patients.find( {  Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ]  }    }  )
登入後複製

In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.

Do We Still Need Compound Indexes?

To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

  • Download MongoDB 2.6 Today
  • Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

Video Face Swap

Video Face Swap

使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱門文章

<🎜>:泡泡膠模擬器無窮大 - 如何獲取和使用皇家鑰匙
3 週前 By 尊渡假赌尊渡假赌尊渡假赌
北端:融合系統,解釋
3 週前 By 尊渡假赌尊渡假赌尊渡假赌
Mandragora:巫婆樹的耳語 - 如何解鎖抓鉤
3 週前 By 尊渡假赌尊渡假赌尊渡假赌

熱工具

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

SublimeText3漢化版

中文版,非常好用

禪工作室 13.0.1

禪工作室 13.0.1

強大的PHP整合開發環境

Dreamweaver CS6

Dreamweaver CS6

視覺化網頁開發工具

SublimeText3 Mac版

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

熱門話題

Java教學
1664
14
CakePHP 教程
1423
52
Laravel 教程
1321
25
PHP教程
1269
29
C# 教程
1249
24
使用 Composer 解決推薦系統的困境:andres-montanez/recommendations-bundle 的實踐 使用 Composer 解決推薦系統的困境:andres-montanez/recommendations-bundle 的實踐 Apr 18, 2025 am 11:48 AM

在開發一個電商網站時,我遇到了一個棘手的問題:如何為用戶提供個性化的商品推薦。最初,我嘗試了一些簡單的推薦算法,但效果並不理想,用戶的滿意度也因此受到影響。為了提升推薦系統的精度和效率,我決定採用更專業的解決方案。最終,我通過Composer安裝了andres-montanez/recommendations-bundle,這不僅解決了我的問題,還大大提升了推薦系統的性能。可以通過一下地址學習composer:學習地址

如何在Debian上配置MongoDB自動擴容 如何在Debian上配置MongoDB自動擴容 Apr 02, 2025 am 07:36 AM

本文介紹如何在Debian系統上配置MongoDB實現自動擴容,主要步驟包括MongoDB副本集的設置和磁盤空間監控。一、MongoDB安裝首先,確保已在Debian系統上安裝MongoDB。使用以下命令安裝:sudoaptupdatesudoaptinstall-ymongodb-org二、配置MongoDB副本集MongoDB副本集確保高可用性和數據冗餘,是實現自動擴容的基礎。啟動MongoDB服務:sudosystemctlstartmongodsudosys

Navicat查看MongoDB數據庫密碼的方法 Navicat查看MongoDB數據庫密碼的方法 Apr 08, 2025 pm 09:39 PM

直接通過 Navicat 查看 MongoDB 密碼是不可能的,因為它以哈希值形式存儲。取回丟失密碼的方法:1. 重置密碼;2. 檢查配置文件(可能包含哈希值);3. 檢查代碼(可能硬編碼密碼)。

CentOS上GitLab的數據庫如何選擇 CentOS上GitLab的數據庫如何選擇 Apr 14, 2025 pm 04:48 PM

CentOS系統上GitLab數據庫部署指南選擇合適的數據庫是成功部署GitLab的關鍵步驟。 GitLab兼容多種數據庫,包括MySQL、PostgreSQL和MongoDB。本文將詳細介紹如何選擇並配置這些數據庫。數據庫選擇建議MySQL:一款廣泛應用的關係型數據庫管理系統(RDBMS),性能穩定,適用於大多數GitLab部署場景。 PostgreSQL:功能強大的開源RDBMS,支持複雜查詢和高級特性,適合處理大型數據集。 MongoDB:流行的NoSQL數據庫,擅長處理海

CentOS MongoDB備份策略是什麼 CentOS MongoDB備份策略是什麼 Apr 14, 2025 pm 04:51 PM

CentOS系統下MongoDB高效備份策略詳解本文將詳細介紹在CentOS系統上實施MongoDB備份的多種策略,以確保數據安全和業務連續性。我們將涵蓋手動備份、定時備份、自動化腳本備份以及Docker容器環境下的備份方法,並提供備份文件管理的最佳實踐。手動備份:利用mongodump命令進行手動全量備份,例如:mongodump-hlocalhost:27017-u用戶名-p密碼-d數據庫名稱-o/備份目錄此命令會將指定數據庫的數據及元數據導出到指定的備份目錄。

MongoDB 與關係數據庫:全面比較 MongoDB 與關係數據庫:全面比較 Apr 08, 2025 pm 06:30 PM

MongoDB與關係型數據庫:深度對比本文將深入探討NoSQL數據庫MongoDB與傳統關係型數據庫(如MySQL和SQLServer)的差異。關係型數據庫採用行和列的表格結構組織數據,而MongoDB則使用靈活的面向文檔模型,更適應現代應用的需求。主要區別數據結構:關係型數據庫使用預定義模式的表格存儲數據,表間關係通過主鍵和外鍵建立;MongoDB使用類似JSON的BSON文檔存儲在集合中,每個文檔結構可獨立變化,實現無模式設計。架構設計:關係型數據庫需要預先定義固定的模式;MongoDB支持

Debian MongoDB如何進行數據加密 Debian MongoDB如何進行數據加密 Apr 12, 2025 pm 08:03 PM

在Debian系統上為MongoDB數據庫加密,需要遵循以下步驟:第一步:安裝MongoDB首先,確保您的Debian系統已安裝MongoDB。如果沒有,請參考MongoDB官方文檔進行安裝:https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/第二步:生成加密密鑰文件創建一個包含加密密鑰的文件,並設置正確的權限:ddif=/dev/urandomof=/etc/mongodb-keyfilebs=512

mongodb怎麼設置用戶 mongodb怎麼設置用戶 Apr 12, 2025 am 08:51 AM

要設置 MongoDB 用戶,請按照以下步驟操作:1. 連接到服務器並創建管理員用戶。 2. 創建要授予用戶訪問權限的數據庫。 3. 使用 createUser 命令創建用戶並指定其角色和數據庫訪問權限。 4. 使用 getUsers 命令檢查創建的用戶。 5. 可選地設置其他權限或授予用戶對特定集合的權限。

See all articles