Efficient Indexing in MongoDB 2.6
By Osmar Olivo, Product Manager at MongoDB One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Havi
By Osmar Olivo, Product Manager at MongoDB
One of the most powerful features of MongoDB is its rich indexing functionality. Users can specify secondary indexes on any field, compound indexes, geospatial, text, sparse, TTL, and others. Having extensive indexing functionality makes it easier for developers to build apps that provide rich functionality and low latency.
MongoDB 2.6 introduces a new query planner, including the ability to perform index intersection. Prior to 2.6 the query planner could only make use of a single index for most queries. That meant that if you wanted to query on multiple fields together, you needed to create a compound index. It also meant that if there were several different combinations of fields you wanted to query on, you might need several different compound indexes.
Each index adds overhead to your deployment - indexes consume space, on disk and in RAM, and indexes are maintained during updates, which adds disk IO. In other words, indexes improve the efficiency of many operations, but they also come at a cost. For many applications, index intersection will allow users to reduce the number of indexes they need while still providing rich features and low latency.
In the following sections we will take a deep dive into index intersection and how it can be applied to applications.
An Example - The Phone Book
Let’s take the example of a phone book with the following schema.
{ FirstName LastName Phone_Number Address }
If I were to search for “Smith, John” how would I index the following query to be as efficient as possible?
db.phonebook.find({ FirstName : “John”, LastName : “Smith” })
I could use an individual index on FirstName and search for all of the “Johns”.
This would look something like ensureIndex( { FirstName : 1 } )
We run this query and we get back 200,000 John Smiths. Looking at the explain() output below however, we see that we scanned 1,000,000 “Johns” in the process of finding 200,000 “John Smiths”.
> db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 1000000, "nscanned" : 1000000, "nscannedObjectsAllPlans" : 1000101, "nscannedAllPlans" : 1000101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 2, "nChunkSkips" : 0, "millis" : 2043, "indexBounds" : { "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
How about creating an individual index on LastName?
This would look something like ensureIndex( { LastName : 1 } )
Running this query we get back 200,000 “John Smiths” but our explain output says that we now scanned 400,000 “Smiths”. How can we make this better?
db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 400000, "nscanned" : 400000, "nscannedObjectsAllPlans" : 400101, "nscannedAllPlans" : 400101, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 1, "nChunkSkips" : 0, "millis" : 852, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
So we know that there are 1,000,000 “John” entries, 400,000 “Smith” entries, and 200,000 “John Smith” entries in our phonebook. Is there a way that we can scan just the 200,000 we need?
In the case of a phone book this is somewhat simple; since we know that we want it to be sorted by Lastname, Firstname we can create a compound index on them, like the below.
ensureIndex( { LastName : true, FirstName : 1 } ) db.phonebook.find({ FirstName : "John", LastName : "Smith"}).explain() { "cursor" : "BtreeCursor LastName_1_FirstName_1", "isMultiKey" : false, "n" : 200000, "nscannedObjects" : 200000, "nscanned" : 200000, "nscannedObjectsAllPlans" : 200000, "nscannedAllPlans" : 200000, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 370, "indexBounds" : { "LastName" : [ [ "Smith", "Smith" ] ], "FirstName" : [ [ "John", "John" ] ] }, "server" : "Oz-Olivo-MacBook-Pro.local:27017" }
Looking at the explain on this, we see that the index only scanned the 200,000 documents that matched, so we got a perfect hit.
Beyond Compound Indexes
The compound index is a great solution in the case of a phonebook in which we always know how we are going to be querying our data. Now what if we have an application in which users can arbitrarily query for different fields together? We can’t possibly create a compound index for every possible combination because of the overhead imposed by indexes, as we discussed above, and because MongoDB limits you to 64 indexes per collection. Index intersection can really help.
Imagine the case of a medical application which doctors use to filter through patients. At a high level, omitting several details, a basic schema may look something like the below.
{ Fname LName SSN Age Blood_Type Conditions : [] Medications : [ ] ... ... }
Some sample searches that a doctor/nurse may run on this system would look something like the below.
Find me a Patient with Blood_Type = O under the age of 50
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
Find me all patients over the age of 60 on Medication X
db.patients.find( { Medications : “X” , Age : { $gt : 60} })
Find me all Diabetic patients on medication Y
db.patients.find( { Conditions : “Diabetes”, Medications : “Y” } )
With all of the unstructured data in modern applications, along with the desire to be able to search for things as needed in an ad-hoc way, it can become very difficult to predict usage patterns. Since we can’t possibly create compound indexes for every combination of fields, because we don’t necessarily know what those will be ahead of time, we can try indexing individual fields to try to salvage some performance. But as shown above in our phone book application, this can lead to performance issues in which we pull documents into memory that are not matches.
To avoid the paging of unnecessary data, the new index intersection feature in 2.6 increases the overall efficiency of these types of ad-hoc queries by processing the indexes involved individually and then intersecting the result set to find the matching documents. This means you only pull the final matching documents into memory and everything else is processed using the indexes. This processing will utilize more CPU, but should greatly reduce the amount of IO done for queries where all of the data is not in memory as well as allow you to utilize your memory more efficiently.
For example, looking at the earlier example:
db.patients.find( { Blood_Type : “O”, Age : { $lt : 50 } } )
It is inefficient to find all patients with BloodType: O (which could be millions) and then pull into memory each document to find the ones with age
Instead, the query planner finds all patients with bloodType: O using the index on BloodType, and all patients with age
Index intersection allows for much more efficient use of existing RAM so less total memory will usually be required to fit the working set then previously. Also, if you had several compound indices that were made up of different combinations of fields, then you can reduce the total number of indexes on the system. This means storing less indices in memory as well as achieving better insert/update performance since fewer indices must be updated.
As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.
Optimizing Multi-key Indexes It is also possible to intersect an index with itself in the case of multi-key indexes. Consider the below query:
Find me all patients with Diabetes & High Blood Pressure
db.patients.find( { Conditions : { $all : [ “Diabetes”, “High Blood Pressure” ] } } )
In this case we will find the result set of all Patients with Diabetes, and the result set of all patients with High blood pressure, and intersect the two to get all patients with both. Again, this requires less memory and disk speed for better overall performance. As of the 2.6.0 release, an index can intersect with itself up to 10 times.
Do We Still Need Compound Indexes?
To be clear, compound indexing will ALWAYS be more performant IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.
- Download MongoDB 2.6 Today
- Learn about all of the key new features in MongoDB 2.6 by downloading the whitepaper
原文地址:Efficient Indexing in MongoDB 2.6, 感谢原作者分享。

핫 AI 도구

Undresser.AI Undress
사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover
사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool
무료로 이미지를 벗다

Clothoff.io
AI 옷 제거제

AI Hentai Generator
AI Hentai를 무료로 생성하십시오.

인기 기사

뜨거운 도구

메모장++7.3.1
사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전
중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기
강력한 PHP 통합 개발 환경

드림위버 CS6
시각적 웹 개발 도구

SublimeText3 Mac 버전
신 수준의 코드 편집 소프트웨어(SublimeText3)

뜨거운 주제









최신 기능과 개선 사항을 제공하는 최신 버전의 MongoDB(현재 5.0) 사용을 권장합니다. 버전을 선택할 때 기능 요구 사항, 호환성, 안정성 및 커뮤니티 지원을 고려해야 합니다. 예를 들어 최신 버전에는 트랜잭션 및 집계 파이프라인 최적화와 같은 기능이 있습니다. 버전이 애플리케이션과 호환되는지 확인하세요. 프로덕션 환경의 경우 장기 지원 버전을 선택하세요. 최신 버전에는 더욱 활발한 커뮤니티 지원이 포함되어 있습니다.

Node.js는 서버측 JavaScript 런타임인 반면, Vue.js는 대화형 사용자 인터페이스를 생성하기 위한 클라이언트측 JavaScript 프레임워크입니다. Node.js는 백엔드 서비스 API 개발, 데이터 처리 등 서버 측 개발에 사용되고, Vue.js는 단일 페이지 애플리케이션, 반응형 사용자 인터페이스 등 클라이언트 측 개발에 사용됩니다.

MongoDB 데이터베이스의 데이터는 로컬 파일 시스템, 네트워크 파일 시스템 또는 클라우드 스토리지에 있는 지정된 데이터 디렉터리에 저장됩니다. 구체적인 위치는 다음과 같습니다. 로컬 파일 시스템: 기본 경로는 Linux/macOS입니다. /데이터/db, Windows: C:\data\db. 네트워크 파일 시스템: 경로는 파일 시스템에 따라 다릅니다. 클라우드 스토리지: 경로는 클라우드 스토리지 제공업체에 의해 결정됩니다.

MongoDB 데이터베이스는 유연성, 확장성 및 고성능으로 잘 알려져 있습니다. 그 장점은 다음과 같습니다: 데이터를 유연하고 구조화되지 않은 방식으로 저장할 수 있는 문서 데이터 모델입니다. 샤딩을 통해 여러 서버로 수평 확장이 가능합니다. 쿼리 유연성, 복잡한 쿼리 및 집계 작업을 지원합니다. 데이터 복제 및 내결함성은 데이터 중복성과 고가용성을 보장합니다. 프런트엔드 애플리케이션과의 손쉬운 통합을 위한 JSON 지원. 많은 양의 데이터를 처리하는 경우에도 빠른 응답을 위한 고성능입니다. 오픈 소스이며 사용자 정의가 가능하고 무료로 사용할 수 있습니다.

MongoDB는 대량의 정형 및 비정형 데이터를 저장하고 관리하는 데 사용되는 문서 중심의 분산 데이터베이스 시스템입니다. 핵심 개념은 문서 저장 및 배포이며 주요 기능으로는 동적 스키마, 인덱싱, 집계, 맵 축소 및 복제가 있습니다. 콘텐츠 관리 시스템, 전자상거래 플랫폼, 소셜 미디어 웹사이트, IoT 애플리케이션, 모바일 애플리케이션 개발에 널리 사용됩니다.

Linux/macOS: 데이터 디렉터리를 생성하고 "mongod" 서비스를 시작합니다. Windows의 경우: 데이터 디렉터리를 만들고 Service Manager에서 MongoDB 서비스를 시작합니다. Docker에서: "docker run" 명령을 실행하십시오. 다른 플랫폼: MongoDB 설명서를 참조하세요. 확인 방법: "mongo" 명령을 실행하여 연결하고 서버 버전을 확인하세요.

MongoDB 데이터베이스 파일은 MongoDB 데이터 디렉터리에 위치하며 기본적으로 /data/db이며, 여기에는 .bson(문서 데이터), ns(수집 정보), Journal(쓰기 작업 기록), wiredTiger(WiredTiger 사용 시 데이터)가 포함됩니다. 스토리지 엔진) 및 config(데이터베이스 구성 정보) 및 기타 파일입니다.

Navicat 만료 문제를 해결하는 방법은 다음과 같습니다: 라이센스 갱신, 자동 업데이트 비활성화, Navicat 고객 지원에 문의하세요.
