Schema Design for Social Inboxes in MongoDB
Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application w
Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application when determining the best design. In this post, we’ll explore three approaches for using MongoDB when creating social inboxes or message timelines.
If you’re building a social network, like Twitter for example, you need to design a schema that is efficient for users viewing their inbox, as well as users sending messages to all their followers. The whole point of social media, after all, is that you can connect in real time.
There are several design considerations for this kind of application:
- The application needs to support a potentially large volume of reads and writes.
- Reads and writes are not uniformly distributed across users. Some users post much more frequently than others, and some users have many, many more followers than others.
- The application must provide a user experience that is instantaneous.
- Edit 11/6: The application will have little to no user deletions of data (a follow up blog post will include information about user deletions and historical data)
Because we are designing an application that needs to support a large volume of reads and writes we will be using a sharded collection for the messages. All three designs include the concept of “fan out,” which refers to distributing the work across the shards in parallel:
- Fan out on Read
- Fan out on Write
- Fan out on Write with Buckets
Each approach presents trade-offs, and you should use the design that is best for your application’s requirements.
The first design you might consider is called Fan Out on Read. When a user sends a message, it is simply saved to the inbox collection. When any user views their own inbox, the application queries for all messages that include the user as a recipient. The messages are returned in descending date order so that users can see the most recent messages.
To implement this design, create a sharded collection called inbox
, specifying the from
field as the shard key, which represents the address sending the message. You can then add a compound index on the to field and the sent field. Once the document is saved into the inbox, the message is effectively sent to all the recipients. With this approach sending messages is very efficient.
Viewing an inbox, on the other hand, is less efficient. When a user views their inbox the application issues a find command based on the to
field, sorted by sent. Because the inbox collection uses from
as its shard key, messages are grouped by sender across the shards. In MongoDB queries that are not based on the shard key will be routed to all shards. Therefore, each inbox view will be routed to all shards in the system. As the system scales and many users go to view their inbox, all queries will be routed to all shards. This design does not scale as well as each query being routed to a single shard.
With the “Fan Out on Read” method, sending a message is very efficient, but viewing the inbox is less efficient.
Fan out on Read is very efficient for sending messages, but less efficient for reading messages. If the majority of your application consists of users sending messages, but very few go to read what anyone sends them — let’s call it an anti-social app — then this design might work well. However, for most social apps there are more requests by users to view their inbox than there are to send messages.
The Fan out on Write takes a different approach that is more optimized for viewing inboxes. This time, instead of sharding our inbox collection on the sender, we shard on the message recipient. In this way, when we go to view an inbox the queries can be routed to a single shard, which will scale very well. Our message document is the same, but now save a copy of the message for every recipient.
With the “Fan Out on Write” method, viewing the inbox is efficient, but sending messages consumes more resources.
In practice we might implement the saving of messages asynchronously. Imagine two celebrities quickly exchange messages at a high-profile event - the system could quickly be saturated with millions of writes. By saving a first copy of their message, then using a pool of background workers to write copies to all followers, we can ensure the two celebrities can exchange messages quickly, and that followers will soon have their own copies. Furthermore, we could maintain a last-viewed date on the user document to ensure they have accessed the system recently - zombie accounts probably shouldn’t get a copy of the message, and for users that haven’t accessed their account recently we could always resort to our first design - Fan out on Read - to repopulate their inbox. Subsequent requests would then be fast again.
At this point we have improved the design for viewing inboxes by routing each inbox view to a single shard. However, each message in the user’s inbox will produce a random read operation. If each inbox view produces 50 random reads, then it only takes a relatively modest number of concurrent users to potentially saturate the disks. Fortunately we can take advantage of the document data model to further optimize this design to be even more efficient.
Fan out on Write with Buckets refines the Fan Out on Write design by “bucketing” messages together into documents of 50 messages ordered by time. When a user views their inbox the request can be fulfilled by reading just a few documents of 50 messages each instead of performing many random reads. Because read time is dominated by seek time, reducing the number of seeks can provide a major performance improvement to the application. Another advantage to this approach is that there are fewer index entries.
To implement this design we create two collections, an inbox collection and a user collection. The inbox collection uses two fields for the shard key, owner
and sequence
, which holds the owner’s user id and sequence number (i.e. the id of 50-message “bucket” documents in their inbox). The user collection contains simple user documents for tracking the total number of messages in their inbox. Since we will probably need to show the total number of messages for a user in a variety of places in our application, this is a nice place to maintain the count instead of calculating for each request. Our message document is the same as in the prior examples.
To send a message we iterate through the list of recipients as we did in the Fan out on Write example, but we also take another step to increment the count of total messages in the inbox of the recipient, which is maintained on the user document. Once we know the count of messages, we know the “bucket” in which to add the latest message. As these messages reach the 50 item threshold, the sequence number increments and we begin to add messages to the next “bucket” document. The most recent messages will always be in the “bucket” document with the highest sequence number. Viewing the most recent 50 messages for a user’s inbox is at most two reads; viewing the most recent 100 messages is at most three reads.
Normally a user’s entire inbox will exist on a single shard. However, it is possible that a few user inboxes could be spread across two shards. Because our application will probably page through a user’s inbox, it is still likely that every query for these few users will be routed to a single shard.
Fan out on Write with Buckets is generally the most scalable approach of the these three designs for social inbox applications. Every design presents different trade-offs. In this case viewing a user’s inbox is very efficient, but writes are somewhat more complex, and more disk space is consumed. For many applications these are the right trade-offs to make.
Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:
Shard per recipient Multiple writes |
Shard per recipient Appends (grows) |
||
Broadcast all shards Random reads |
Single shard Random reads |
Single shard Single read |
|
Message stored once |
Copy per recipient |
Copy per recipient |
Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:
- Check out the recording of our recent Schema Design webinar on this topic.
- Schema Design in general in the MongoDB Manual
- You can also view our schema design resources on the MongoDB docs page
- If you have any schema design questions, please view the archived questions on our user forum or ask a question yourselfon the MongoDB User Forum.
原文地址:Schema Design for Social Inboxes in MongoDB, 感谢原作者分享。

熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

Video Face Swap
使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱門文章

熱工具

記事本++7.3.1
好用且免費的程式碼編輯器

SublimeText3漢化版
中文版,非常好用

禪工作室 13.0.1
強大的PHP整合開發環境

Dreamweaver CS6
視覺化網頁開發工具

SublimeText3 Mac版
神級程式碼編輯軟體(SublimeText3)

Kernelsecuritycheckfailure(內核檢查失敗)就是一個比較常見的停止代碼類型,可藍屏錯誤出現不管是什麼原因都讓很多的有用戶們十分的苦惱,下面就讓本站來為用戶們來仔細的介紹一下17種解決方法吧。 kernel_security_check_failure藍色畫面的17種解決方法方法1:移除全部外部裝置當您使用的任何外部裝置與您的Windows版本不相容時,則可能會發生Kernelsecuritycheckfailure藍色畫面錯誤。為此,您需要在嘗試重新啟動電腦之前拔下全部外部裝置。

上週尚未宣布三款新產品將於 2024 年 7 月 8 日發布:CMF Watch Pro 2、CMF Phone 1 和 CMF Buds Pro 2。

Win10skype可以卸載嗎是許多用戶都想知道的問題,因為很多的用戶發現自己電腦上的預設程式上有這個應用,擔心刪除後會影響到系統的運行,下面就讓本站來為用戶們來仔細的介紹一下Win10如何卸載SkypeforBusiness吧。 Win10如何解除安裝SkypeforBusiness1、在電腦桌面點選Windows圖標,再點選設定圖示進入。 2、點選“應用”。 3、在搜尋框中輸入“Skype”,點選選取找到的結果。 4、點選「卸載」。 5

除了推出時提供的時尚但易於染色的白色 Cybertruck 內裝外,特斯拉現在還推出了更多顏色的電動皮卡。由於採用幸運的新品,新款戰術灰色 Cybertruck 內裝顏色首次預覽

用for求n階乘的方法:1.使用「for (var i=1;i<=n;i++){}」語句控制迴圈遍歷範圍為「1~n」;2、迴圈體中,使用「cj *=i」將1到n的數相乘,乘積賦值給變數cj;3、循環結束後,變數cj的值就n的階乘,輸出即可。

HMD Global 將於 7 月 10 日推出 Skyline,推出諾基亞 Lumia 920 風格的中端智慧型手機。

區別:1、for透過索引來循環遍歷每一個資料元素,而forEach透過JS底層程式來循環遍歷數組的資料元素;2、for可以透過break關鍵字來終止迴圈的執行,而forEach不可以;3、 for可以透過控制迴圈變數的數值來控制迴圈的執行,而forEach不行;4、for在迴圈外可以呼叫迴圈變量,而forEach在迴圈外不能呼叫迴圈變數;5、for的執行效率要高於forEach。

引言實際的業務項目開發中,大家應該對從給定的list中剔除不符合條件的元素這個操作不陌生吧?很多同學可以立刻想出很多種實現的方式,但你想到的這些實現方式都是人畜無害的嗎?很多看似正常的操作其實背後是個陷阱,很多新手可能稍不留神就會掉進去。倘若不幸踩中:程式碼運行時直接拋異常報錯,這個算是不幸中的萬幸,至少可以及時發現並去解決程式碼運行不報錯,但是業務邏輯莫名其妙的出現各種奇怪問題,這種就比較悲劇了,因為這個問題稍不留神的話,可能就會讓後續業務埋下隱憂。那麼,到底有哪些實現方式呢?哪些實現方式可能會
