首页 数据库 mysql教程 Schema Design for Social Inboxes in MongoDB

Schema Design for Social Inboxes in MongoDB

Jun 07, 2016 pm 04:32 PM
Design for social

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application w

Designing a schema is a critical part of any application. Like most databases, there are many options for modeling data in MongoDB, and it is important to incorporate the functional requirements and performance goals for your application when determining the best design. In this post, we’ll explore three approaches for using MongoDB when creating social inboxes or message timelines.

If you’re building a social network, like Twitter for example, you need to design a schema that is efficient for users viewing their inbox, as well as users sending messages to all their followers. The whole point of social media, after all, is that you can connect in real time.

There are several design considerations for this kind of application:

  • The application needs to support a potentially large volume of reads and writes.
  • Reads and writes are not uniformly distributed across users. Some users post much more frequently than others, and some users have many, many more followers than others.
  • The application must provide a user experience that is instantaneous.
  • Edit 11/6: The application will have little to no user deletions of data (a follow up blog post will include information about user deletions and historical data)

Because we are designing an application that needs to support a large volume of reads and writes we will be using a sharded collection for the messages. All three designs include the concept of “fan out,” which refers to distributing the work across the shards in parallel:

  1. Fan out on Read
  2. Fan out on Write
  3. Fan out on Write with Buckets

Each approach presents trade-offs, and you should use the design that is best for your application’s requirements.

The first design you might consider is called Fan Out on Read. When a user sends a message, it is simply saved to the inbox collection. When any user views their own inbox, the application queries for all messages that include the user as a recipient. The messages are returned in descending date order so that users can see the most recent messages.

To implement this design, create a sharded collection called inbox, specifying the from field as the shard key, which represents the address sending the message. You can then add a compound index on the to field and the sent field. Once the document is saved into the inbox, the message is effectively sent to all the recipients. With this approach sending messages is very efficient.

Viewing an inbox, on the other hand, is less efficient. When a user views their inbox the application issues a find command based on the to field, sorted by sent. Because the inbox collection uses from as its shard key, messages are grouped by sender across the shards. In MongoDB queries that are not based on the shard key will be routed to all shards. Therefore, each inbox view will be routed to all shards in the system. As the system scales and many users go to view their inbox, all queries will be routed to all shards. This design does not scale as well as each query being routed to a single shard.

With the “Fan Out on Read” method, sending a message is very efficient, but viewing the inbox is less efficient.

Fan out on Read is very efficient for sending messages, but less efficient for reading messages. If the majority of your application consists of users sending messages, but very few go to read what anyone sends them — let’s call it an anti-social app — then this design might work well. However, for most social apps there are more requests by users to view their inbox than there are to send messages.

The Fan out on Write takes a different approach that is more optimized for viewing inboxes. This time, instead of sharding our inbox collection on the sender, we shard on the message recipient. In this way, when we go to view an inbox the queries can be routed to a single shard, which will scale very well. Our message document is the same, but now save a copy of the message for every recipient.

With the “Fan Out on Write” method, viewing the inbox is efficient, but sending messages consumes more resources.

In practice we might implement the saving of messages asynchronously. Imagine two celebrities quickly exchange messages at a high-profile event - the system could quickly be saturated with millions of writes. By saving a first copy of their message, then using a pool of background workers to write copies to all followers, we can ensure the two celebrities can exchange messages quickly, and that followers will soon have their own copies. Furthermore, we could maintain a last-viewed date on the user document to ensure they have accessed the system recently - zombie accounts probably shouldn’t get a copy of the message, and for users that haven’t accessed their account recently we could always resort to our first design - Fan out on Read - to repopulate their inbox. Subsequent requests would then be fast again.

At this point we have improved the design for viewing inboxes by routing each inbox view to a single shard. However, each message in the user’s inbox will produce a random read operation. If each inbox view produces 50 random reads, then it only takes a relatively modest number of concurrent users to potentially saturate the disks. Fortunately we can take advantage of the document data model to further optimize this design to be even more efficient.

Fan out on Write with Buckets refines the Fan Out on Write design by “bucketing” messages together into documents of 50 messages ordered by time. When a user views their inbox the request can be fulfilled by reading just a few documents of 50 messages each instead of performing many random reads. Because read time is dominated by seek time, reducing the number of seeks can provide a major performance improvement to the application. Another advantage to this approach is that there are fewer index entries.

To implement this design we create two collections, an inbox collection and a user collection. The inbox collection uses two fields for the shard key, owner and sequence, which holds the owner’s user id and sequence number (i.e. the id of 50-message “bucket” documents in their inbox). The user collection contains simple user documents for tracking the total number of messages in their inbox. Since we will probably need to show the total number of messages for a user in a variety of places in our application, this is a nice place to maintain the count instead of calculating for each request. Our message document is the same as in the prior examples.

To send a message we iterate through the list of recipients as we did in the Fan out on Write example, but we also take another step to increment the count of total messages in the inbox of the recipient, which is maintained on the user document. Once we know the count of messages, we know the “bucket” in which to add the latest message. As these messages reach the 50 item threshold, the sequence number increments and we begin to add messages to the next “bucket” document. The most recent messages will always be in the “bucket” document with the highest sequence number. Viewing the most recent 50 messages for a user’s inbox is at most two reads; viewing the most recent 100 messages is at most three reads.

Normally a user’s entire inbox will exist on a single shard. However, it is possible that a few user inboxes could be spread across two shards. Because our application will probably page through a user’s inbox, it is still likely that every query for these few users will be routed to a single shard.

Fan out on Write with Buckets is generally the most scalable approach of the these three designs for social inbox applications. Every design presents different trade-offs. In this case viewing a user’s inbox is very efficient, but writes are somewhat more complex, and more disk space is consumed. For many applications these are the right trade-offs to make.

Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

Fan out on Read
Fan out on Write
Fan out on Write with Buckets
Send Message Performance
Best
Single write
Good
Shard per recipient
Multiple writes
Worst
Shard per recipient
Appends (grows)
Read Inbox Performance
Worst
Broadcast all shards
Random reads
Good
Single shard
Random reads
Best
Single shard
Single read
Data Size
Best
Message stored once
Worst
Copy per recipient
Worst
Copy per recipient


Schema design is one of the most important optimizations you can make for your application. We have a number of additional resources available on schema design if you are interested in learning more:

  • Check out the recording of our recent Schema Design webinar on this topic.
  • Schema Design in general in the MongoDB Manual
  • You can also view our schema design resources on the MongoDB docs page
  • If you have any schema design questions, please view the archived questions on our user forum or ask a question yourselfon the MongoDB User Forum.
本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
3 周前 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳图形设置
3 周前 By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您听不到任何人,如何修复音频
3 周前 By 尊渡假赌尊渡假赌尊渡假赌

热工具

记事本++7.3.1

记事本++7.3.1

好用且免费的代码编辑器

SublimeText3汉化版

SublimeText3汉化版

中文版,非常好用

禅工作室 13.0.1

禅工作室 13.0.1

功能强大的PHP集成开发环境

Dreamweaver CS6

Dreamweaver CS6

视觉化网页开发工具

SublimeText3 Mac版

SublimeText3 Mac版

神级代码编辑软件(SublimeText3)

解决kernel_security_check_failure蓝屏的17种方法 解决kernel_security_check_failure蓝屏的17种方法 Feb 12, 2024 pm 08:51 PM

Kernelsecuritycheckfailure(内核检查失败)就是一个比较常见的停止代码类型,可蓝屏错误出现不管是什么原因都让很多的有用户们十分的苦恼,下面就让本站来为用户们来仔细的介绍一下17种解决方法吧。kernel_security_check_failure蓝屏的17种解决方法方法1:移除全部外部设备当您使用的任何外部设备与您的Windows版本不兼容时,则可能会发生Kernelsecuritycheckfailure蓝屏错误。为此,您需要在尝试重新启动计算机之前拔下全部外部设备。

没有任何内容显示 CMF Watch Pro 2 的全新设计,揭示了有关 CMF Phone 1 的好奇细节 没有任何内容显示 CMF Watch Pro 2 的全新设计,揭示了有关 CMF Phone 1 的好奇细节 Jun 27, 2024 am 10:42 AM

上周尚未宣布三款新产品将于 2024 年 7 月 8 日发布:CMF Watch Pro 2、CMF Phone 1 和 CMF Buds Pro 2。现在制造商发布了预告图片,揭示了这些产品的新设计细节

Win10如何卸载Skype for Business?电脑上的skype怎么彻底卸载方法 Win10如何卸载Skype for Business?电脑上的skype怎么彻底卸载方法 Feb 13, 2024 pm 12:30 PM

Win10skype可以卸载吗是很多用户们都想知道的一个问题,因为很多的用户们发现自己电脑上的默认程序上有这个应用,担心删除后会影响到系统的运行,下面就让本站来为用户们来仔细的介绍一下Win10如何卸载SkypeforBusiness吧。Win10如何卸载SkypeforBusiness1、在电脑桌面点击Windows图标,再点击设置图标进入。2、点击“应用”。3、在搜索框中输入“Skype”,点击选中找到的结果。4、点击“卸载”。5

Cyber​​truck车主评价全新战术灰内饰设计和交付体验 Cyber​​truck车主评价全新战术灰内饰设计和交付体验 Jun 30, 2024 pm 09:44 PM

除了推出时提供的时尚但易于染色的白色 Cyber​​truck 内饰外,特斯拉现在还推出了更多颜色的电动皮卡。新的战术灰色赛博卡车内饰颜色已经首次预览,这要归功于一个幸运的新产品

JavaScript怎么用for求n的阶乘 JavaScript怎么用for求n的阶乘 Dec 08, 2021 pm 06:04 PM

用for求n阶乘的方法:1、使用“for (var i=1;i

HMD Slate Tab 5G 泄露为中端平板电脑,搭载 Snapdragon 7s Gen 2、10.6 英寸显示屏和 Lumia 设计 HMD Slate Tab 5G 泄露为中端平板电脑,搭载 Snapdragon 7s Gen 2、10.6 英寸显示屏和 Lumia 设计 Jun 18, 2024 pm 05:46 PM

HMD Global 将于 7 月 10 日推出 Skyline,推出一款类似诺基亚 Lumia 920 风格的中端智能手机。根据泄密者 @smashx_60 的最新信息,Lumia 设计很快也将用于平板电脑,这将是 c

foreach和for循环的区别是什么 foreach和for循环的区别是什么 Jan 05, 2023 pm 04:26 PM

区别:1、for通过索引来循环遍历每一个数据元素,而forEach通过JS底层程序来循环遍历数组的数据元素;2、for可以通过break关键词来终止循环的执行,而forEach不可以;3、for可以通过控制循环变量的数值来控制循环的执行,而forEach不行;4、for在循环外可以调用循环变量,而forEach在循环外不能调用循环变量;5、for的执行效率要高于forEach。

Python中的常见流程控制结构有哪些? Python中的常见流程控制结构有哪些? Jan 20, 2024 am 08:17 AM

Python中常见的流程控制结构有哪几种?在Python中,流程控制结构是用来决定程序的执行顺序的重要工具。它们允许我们根据不同的条件执行不同的代码块,或者重复执行一段代码。下面将介绍Python中常见的流程控制结构,并提供相应的代码示例。条件语句(if-else):条件语句允许我们根据不同的条件执行不同的代码块。它的基本语法是:if条件1:#当条件

See all articles