大部分開發者都知道加索引會快。但在實際過程中,我們常碰到一些疑問&困難:
本文嘗試去解釋索引的基本知識&解答上述的疑問。
大部分開發者接觸索引,大概知道索引類似書的目錄,你要找到想要的內容,透過目錄找到限定的關鍵字,進而找到對應的章節的pageno,再找到具體的內容。
在資料結構裡面,最簡單的索引實作類似hashmap,透過關鍵字key,映射到具體的位置,找到具體的內容。但除了hash的方式,還有多種的方式實作索引。
hash / b-tree / b -tree redis HSET / MongoDB&PostgreSQL / MySQL
hashmap
#一圖見b-tree & b -tree 差異:
演算法查找複雜度上來說:
至於為何MongoDB 的實作選擇b-tree 而非b - tree ?
網路上多篇文章有闡述,非本文重點。
index盡量儲存在內存,data 其次。
注意只保留必要的index,記憶體盡量用在刀刃上。
如果index memory 都接近佔滿memory,那麼就很容易讀到disk,速度就下來了。
insert/update/delete 會觸發rebalance tree,所以,增刪改數據,索引會觸發修改,性能會有損耗,索引不是越多越好。既然如此,就選取哪些欄位作為索引呢?當查詢用到這些條件,怎麼辦?
拿一個最簡單的hashmap來講,為什麼複雜度不是O(1),而是所謂接近 O(1)。因為有key 衝突/重複,DB 去找的時候,key 衝突的資料一大堆的話,還是得輪著繼續找。 b-tree 看鍵(key)的選擇也是如此。
因此一個大部分開發常犯的錯就是對沒有區分度的key建索引。例如:很多就只有集中類別的 type/status 的 documents count 達幾十萬以上的collection,通常這種索引沒什麼幫助。
如果不想多建多餘的索引,開發的同事在複合& 單一欄位選擇上有時候挺糾結的。
根據典型碰到的場景,來做幾個實驗:
這裡創建了個loans collection。簡化只有100個數據。這個是藉貸的表有_id, userId, status(借貸狀態), amount(金額).
db.loans.count()100
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "15D5A9A1", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
注意上面COLLSCAN 全表掃描了,因為沒有索引。接下來我們分別建立幾個索引。
step 1 先建立 {userId:1, status:1}
db.loans.createIndex({userId:1, status:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "BB87F2BA", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "["repayed", "repayed"]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
結果:如願命中 {userId:1, status:1} 作為 winning plan。
step2:再建立個典型的索引userId
db.loans.createIndex({userId:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }
db.loans.find({ "userId" : "59e022d33f239800129c61c7", "status" : "repayed", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "1B1A4861", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[\"59e022d33f239800129c61c7\", \"59e022d33f239800129c61c7\"]" ], "status" : [ "[\"repayed\", \"repayed\"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
留意到DB 偵測到{userId:1, status:1} 為更優執行的方案.
db.loans.find({ "userId" : "59e022d33f239800129c61c7" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "1F09D68E", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
留意到DB 偵測到{userId:1} 為更優執行的方案,嗯~,如我們所料.
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "status" : { "$eq" : "repayed" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
有趣的部分:status不命中索引,全表掃描
接下來的步驟,加一sort :
db.loans.find({ "userId" : "59e022d33f239800129c61c7" }).sort({status:1}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "F5ABB1AA", "planCacheKey" : "764CBAA8", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ { "stage" : "SORT", "sortPattern" : { "status" : 1 }, "inputStage" : { "stage" : "SORT_KEY_GENERATOR", "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
有趣的部分:status 不命中索引
db.loans.find({ "status" : "repayed","userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "status" : { "$eq" : "repayed" } }, { "userId" : { "$eq" : "59e022d33f239800129c61c7" } } ] }, "queryHash" : "15D5A9A1", "planCacheKey" : "1B1A4861", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[\"59e022d33f239800129c61c7\", \"59e022d33f239800129c61c7\"]" ], "status" : [ "[\"repayed\", \"repayed\"]" ] } } }, "rejectedPlans" : [ { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1 }, "indexName" : "userId_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ] } } } ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
命中索引,跟query的各個欄位順序不相關,如我們猜測。
有趣部分再來, 我們刪除索引{userId:1}
db.loans.dropIndex({"userId":1}) { "nIndexesWas" : 3, "ok" : 1 } db.loans.find({"userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "5776AB9C", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "["59e022d33f239800129c61c7", "59e022d33f239800129c61c7"]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
DB 執行分析器覺得索引{userId:1, status:1} 能更優,沒有命中複合索引,這個是因為status不是leading field。
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "status" : { "$eq" : "repayed" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
再換個角度sort 一遍, 與前面query & sort互換,之前是:
db.loans.find({userId:1}).sort({ "status" : "repayed" })
看看有啥不一樣?
db.loans.find({ "status" : "repayed" }).sort({userId:1}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "56EA6313", "planCacheKey" : "2CFCDA7F", "winningPlan" : { "stage" : "FETCH", "filter" : { "status" : { "$eq" : "repayed" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "userId" : 1, "status" : 1 }, "indexName" : "userId_1_status_1", "isMultiKey" : false, "multiKeyPaths" : { "userId" : [ ], "status" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "userId" : [ "[MinKey, MaxKey]" ], "status" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
如猜測,命中索引。
再來玩玩,確認下leading filed試驗:
db.loans.dropIndex("userId_1_status_1") { "nIndexesWas" : 2, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" } ]
db.loans.createIndex({status:1, userId:1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" }, { "v" : 2, "key" : { "status" : 1, "userId" : 1 }, "name" : "status_1_userId_1", "ns" : "cashLoan.loans" } ]
db.loans.find({ "status" : "repayed" }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "status" : { "$eq" : "repayed" } }, "queryHash" : "E6304EB6", "planCacheKey" : "7A94191B", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "status" : 1, "userId" : 1 }, "indexName" : "status_1_userId_1", "isMultiKey" : false, "multiKeyPaths" : { "status" : [ ], "userId" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "status" : [ "["repayed", "repayed"]" ], "userId" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
db.loans.getIndexes() [ { "v" : 2, "key" : { "id" : 1 }, "name" : "id_", "ns" : "cashLoan.loans" }, { "v" : 2, "key" : { "status" : 1, "userId" : 1 }, "name" : "status_1_userId_1", "ns" : "cashLoan.loans" } ]
db.loans.find({"userId" : "59e022d33f239800129c61c7", }).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "cashLoan.loans", "indexFilterSet" : false, "parsedQuery" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "queryHash" : "B1777DBA", "planCacheKey" : "5776AB9C", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "userId" : { "$eq" : "59e022d33f239800129c61c7" } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "RMBAP", "port" : 27017, "version" : "4.1.11", "gitVersion" : "1b8a9f5dc5c3314042b55e7415a2a25045b32a94" }, "ok" : 1 }
看完这个试验,明白了 {userId:1, status:1} vs {status:1,userId:1} 的差别了吗?
PS:这个case 里面其实status 区分度不高,这里只是作为实例展示。
DB 一般都有执行器优化的分析,MySQL & MongoDB 都是 用explain 来做分析。
语法上MySQL :
explain your_sql
MongoDB:
yoursql.explain()
总结典型:理想的查询是结合explain 的指标,他们通常是多个的混合:
文末,还有最开头1个问题没回答:如果我的索引改加的都加了,还不够快,怎么办?
留个悬念,之后再写一篇。
更多PHP相关技术文章,请访问PHP教程栏目进行学习!
以上是MongoDB 索引的最佳實踐的詳細內容。更多資訊請關注PHP中文網其他相關文章!