Home > Technology peripherals > It Industry > How Memory & Disk Performance Affects Your MongoDB Database

How Memory & Disk Performance Affects Your MongoDB Database

Jennifer Aniston
Release: 2025-02-16 09:56:12
Original
972 people have browsed it

How Memory & Disk Performance Affects Your MongoDB Database

This article was originally published in MongoDB. Thanks to the partners who support SitePoint as a possible one.

Understanding the relationship between various internal caches and disk performance and how these relationships affect database and application performance can be challenging. We used YCSB benchmarks to change the working set (the number of documents used in the test) and disk performance to better demonstrate their relationship. When reviewing the results, we will introduce some MongoDB internal mechanisms to improve understanding of common database usage patterns.

Key Points

  1. Understanding disk baseline performance is critical to understanding overall database performance.
  2. High disk waiting time and utilization indicate a disk bottleneck.
  3. WiredTiger IO is random.
  4. Queries for a single replica set are single threaded and sequential.
  5. Disk performance is closely related to the working set size.

Abstract

The main influences of overall system performance is how the working set is related to the storage engine cache size (memory dedicated to storing data) and disk performance (it provides a physical limitation on how quickly data is accessed).

Using YCSB, we explored the interaction between disk performance and cache size, demonstrating how these two factors affect performance. Although this test uses YCSB, synthetic benchmarks cannot represent production workloads. The latency and throughput numbers obtained through these methods do not map to production performance. We used MongoDB 3.4.10, YCSB 0.14, and MongoDB 3.6.0 drivers for these tests. YCSB is configured with 16 threads and "uniform" read-only workloads.

We demonstrate that putting working sets into memory provides optimal application performance, and like any database, exceeding this limit negatively affects latency and overall throughput.

Understanding disk metrics

When considering disk performance, there are four important indicators:

  1. Disk throughput, or the number of requests multiplied by the size of requests. This is usually measured in megabytes per second. Random read and write performance in the 4kb range best represents standard database workloads. Note that many cloud providers limit disk throughput or bandwidth.
  2. Disk delay. On Linux, this is represented by await, i.e., the time (in milliseconds) that is made from the application to the time the data is written or returned to the application. For SSDs, the delay is usually less than 3 milliseconds. HDDs are usually higher than 7 milliseconds. High latency means that the disk is difficult to keep up with a given workload.
  3. Disk IOPS (Input/Output Operations per Second). iostat reports this metric as tps. A given cloud provider may guarantee a certain number of IOPS for a given drive. If you reach this threshold, any further access will be queued, resulting in a disk bottleneck. High-end PCIe attached NVMe devices can provide 1,500,000 IOPS, while a typical hard drive may only support 150 IOPS.
  4. Disk utilization. Reported by util in iostat. Linux has multiple queues per device to serve IO. Utilization indicates what percentage of these queues is within a given time. While this number may be confusing, it is a good indicator of overall disk health.

Test disk performance

While cloud providers may provide IOPS thresholds for a given volume and disk, and disk manufacturers publish expected performance figures, the actual results on your system may vary. Performing IO tests can be very helpful if there is a problem with the disk performance observed.

We usually use fio (flexible IO tester) for testing. We tested on 10GB of data, ioengine is psync, and the reading range is between 4kb and 32kb. While the default fio setting does not represent WiredTiger workloads, we found this configuration to be a good approximation of WiredTiger disk utilization.

All tests are repeated in three disk scenarios:

Scene 1

Default disk settings provided by AWS c5 io1 100GB volume. 5000 IOPS

  • 1144 IOPS / 5025 physical reads/second / 99.85% utilization

Scene 2

Limit the disk to 600 IOPS and introduce a 7 millisecond delay. This should reflect the performance of a typical RAID10 SAN with a hard drive

  • 134 IOPS / 150 physical reads/sec / 95.72% utilization

Scene 3

Further limit the disk to 150 IOPS with a latency of 7 milliseconds. This should simulate a normal spinning hard drive.

  • 34 IOPS / 150 physical reads/sec / 98.2% Utilization

Query how to serve from disk?

WiredTiger storage engine executes its own cache. By default, the WiredTiger cache size is 50% of system memory minus 1GB to allow for other system processes, file system caches, and internal MongoDB operations that use extra memory (such as building indexes, performing memory sorting, deduplication results, text Score, join processing and aggregation) leave enough space. To prevent performance degradation from cache fullness, when utilization exceeds 80%, WiredTiger will automatically start ejecting data from cache. For our tests, this means that the valid cache size is (7634MB – 1024MB)*.5*.8, or 2644MB.

All queries are cached by WiredTiger. This means that the query will cause the index and document to be read into the WiredTiger cache through the file system cache and then return the result. Skip this step if the requested data is already in cache.

WiredTiger uses snappy compression algorithm to store documents by default. Any data read from the file system cache is decompressed before being stored in the WiredTiger cache. Indexes are compressed by default with prefix and are compressed in both disk and WiredTiger cache.

File system cache is an operating system structure used to store frequently accessed files in memory for easier access. Linux is very active in cached files and will try to consume all available memory using file system cache. If more memory is needed, the file system cache is expelled to provide more memory for the application.

This is an animated graph showing disk access to the YCSB collection generated by 100 YCSB read operations. Each operation is a single lookup of _id for a single document.

The upper left corner represents the first byte in the WiredTiger collection file. The disk position is incremented to the right and surrounds. Each line represents a 3.5MB segment of the WiredTiger collection file. Access is arranged in chronological order and represented by animation frames. Access is represented by red and green squares to highlight the current disk access.

How Memory & Disk Performance Affects Your MongoDB Database

3.5 MB vs 4KB

Here, we see that our collection data file is read into memory. Because the data is stored in the B-tree, we may need to find the disk location of the document (small access) by accessing one or more locations on the disk before we can find and read our document (larger access) .

This demonstrates the typical access pattern for MongoDB queries—documents are unlikely to be close to each other on disk. This also shows that even after inserting each other, the documents are unlikely to be in continuous disk locations.

WiredTiger storage engine is designed to "full read": it will issue a read request for all required data at once. This leads us to recommend limiting disk advance reads for WiredTiger deployments to zero, as subsequent access is unlikely to take advantage of additional data retrieved by advance reads.

Working sets are suitable for cache

For our first set of tests, we set the record count to 2 million, resulting in a total size of the data and indexes of 2.43 GB, or 92% of the cache.

Here we see the powerful performance of Scene 1 is 76,113 requests per second. Checking file system cache statistics, we observed that the WiredTiger cache hit rate is 100%, no access, and no bytes read into the file system cache, which means no additional IO is needed throughout the test.

As expected, in Scenario 2 and Scenario 3, changing disk performance (adding 7 millisecond latency and limiting iops to 600 or 150) has minimal impact on throughput (69, 579.5 and 70,252 operations/sec, respectively) .

How Memory & Disk Performance Affects Your MongoDB Database

The 99% response delay for all three of our tests ranged from 0.40 to 0.44 milliseconds.

The work set is larger than the WiredTiger cache, but it is still suitable for file system cache

Modern operating systems cache frequently accessed files for improved read performance. Because the file is already in memory, accessing the cached file will not result in physical reading. The file system cache statistics displayed by the free Linux command details the size of the file system cache.

When we increased the record count from 2 million to 3 million, we increased the total size of the data and index to 3.66GB, 38% larger than the one from the WiredTiger cache service alone.

The metric clearly shows that we read an average of 548 mbps into the WiredTiger cache, but when checking the file system cache metric, we can observe a hit rate of 99.9%.

For this test, we started to see performance drops, with only 66,720 operations performed per second, an 8% decrease compared to our baseline, while our baseline was only from the WiredTiger cache service.

As expected, in this case, decreasing disk performance does not significantly affect our overall throughput (64,484 and 64,229 operations, respectively). The penalty for reading from file system cache will be more obvious when documents are easier to compress or if the CPU is a limiting factor.

How Memory & Disk Performance Affects Your MongoDB Database

We noticed a 54% increase in p99 latency to .53–.55 ms.

The working set is slightly larger than WiredTiger and file system cache

We have determined that WiredTiger and file system cache work together to provide data to serve our queries. However, when we increase the record count from 3 million to 4 million, we can no longer just leverage these caches to serve queries. Our data size grew to 4.8GB, 82% larger than our WiredTiger cache.

Here, we read into the WiredTiger cache at 257.4 mbps. Our file system cache hit rate is reduced to 93-96%, which means 4-7% of reads lead to physical reads from disk.

Changing available IOPS and disk latency has a huge impact on the performance of this test.

The response delay of the 99th percentile further increases. Scene 1:19 milliseconds, Scene 2:171 milliseconds, Scene 3:770 milliseconds, which are 43 times, 389 times and 1751 times compared to the situation in the cache.

Compared to our previous tests that were fully caching-friendly, we saw a 75% reduction in performance when MongoDB offers a full 5000 iops. Scenario 2 and Scenario 3 achieved 5139.5 and 737.95 operations per second, respectively, further proving the IO bottleneck.

How Memory & Disk Performance Affects Your MongoDB Database

The work set is much larger than WiredTiger and file system cache

Move to 5 million records, we increase the data and index size to 6.09GB, which is larger than our combined WiredTiger and file system cache. We see our throughput below our IOPS. In this case, we are still serving 81% of the WiredTiger reads from the file system cache, but the reads from disk overflow are saturating our IO. We see file system cache read speeds for this test are 71, 8.3 and 1.9 Mbps.

The response delay of the 99th percentile further increases. Scenario 1:22ms, Scenario 2:199ms, Scenario 3:810ms, which are 52 times, 454 times and 1841 times compared to the in-cache response latency. Here, changing disk IOPS significantly affects our throughput.

How Memory & Disk Performance Affects Your MongoDB Database

Abstract

Through this series of tests, we proved two main points.

  1. If the working set is suitable for caching, disk performance does not greatly affect application performance.
  2. Disk performance quickly becomes a limiting factor in throughput when the working set exceeds available memory.

Understanding how MongoDB leverages memory and disk is an important part of adjusting deployment size and understanding performance. The internal work of the WiredTiger storage engine attempts to make the most of the hardware, but memory and disk are two critical infrastructure parts that affect the overall performance characteristics of the workload.

Frequently Asked Questions about Memory and Disk Performance in MongoDB

How does MongoDB utilize memory and disk space?

MongoDB uses memory and disk space to store and manage data. It uses a memory mapped file system for data storage, which means it maps the entire data file into RAM. This allows MongoDB to efficiently process large datasets. The operating system's virtual memory subsystem management details, exchange data into and out of memory as needed. On the other hand, disk space is used to store data files, indexes, and logs. MongoDB automatically allocates disk space in large chunks to optimize write operations.

What is the impact of high disk I/O utilization in MongoDB?

High disk I/O utilization will seriously affect the performance of MongoDB database. It causes slower read and write operations, which can reduce overall performance of the application. This is especially problematic for applications that require real-time data access. High disk I/O utilization can also lead to increased CPU usage, as the system spends more time managing disk operations.

How to monitor disk space usage in MongoDB?

MongoDB provides several tools to monitor disk space usage. The db.stats() command provides a high-level overview of the database, including the total size of the data file and index. The db.collection.stats() command provides more detailed information about a specific collection, including the size of the data and index. In addition, MongoDB Atlas (a database-as-a-service product provided by MongoDB) provides a comprehensive set of monitoring tools, including alerts on high disk space usage.

How to solve the high disk space utilization in MongoDB?

There are several strategies to address high disk space utilization in MongoDB. One way is to delete unnecessary data or collections. Another approach is to use the compact command, which defrags the data files and recycles unused disk space. However, this command requires a lot of free disk space and can affect database performance. Sharding (distributes data to multiple servers) can also help manage disk space usage.

What is a RAM drive and how does it have to do with MongoDB?

The RAM drive is a piece of memory that the operating system considers as a disk drive. Because RAM is much faster than disk storage, using RAM drives can significantly improve the performance of applications that require high-speed data access. However, because RAM is volatile, data stored in the RAM drive is lost when the system restarts. In the context of MongoDB, RAM drives can be used to store frequently accessed data or indexes for improved performance. However, this should be done with caution, as data loss may occur if the system restarts.

How does MongoDB handle memory management?

MongoDB relies on the underlying operating system for memory management. It uses memory mapped file systems, allowing the operating system's virtual memory subsystem to manage the details of data in memory as well as data on disk. This approach allows MongoDB to efficiently process large datasets, but it also means that MongoDB's memory usage may be affected by other processes running on the same system.

How to optimize the memory usage of MongoDB?

There are several strategies to optimize the memory usage of MongoDB. One way is to make sure your working set is suitable for memory. Working sets are frequently accessed parts of data. If your working set is suitable for memory, MongoDB can avoid costly disk I/O operations. Another approach is to use indexes efficiently. Indexes can significantly improve query performance, but they can also consume memory. Therefore, it is important to create indexes wisely and monitor their impact on memory usage.

How does MongoDB handle disk I/O operations?

MongoDB uses write-pre-logs to ensure data integrity. They are first written to the log before any changes are made to the data file. This allows MongoDB to recover from a crash or power failure. However, logging can also increase disk I/O operations, which can affect performance. Therefore, it is important to monitor disk I/O utilization and take measures to optimize it if necessary.

How to optimize disk I/O operations of MongoDB?

There are several strategies to optimize disk I/O operations of MongoDB. One way is to use an SSD, which can handle more IOPS than traditional hard drives. Another approach is to use a RAID configuration optimized for write operations. Additionally, you can adjust MongoDB's logging settings to reduce the impact on disk I/O. However, this should be done with caution, as it can affect data integrity.

How does memory and disk performance affect the overall performance of MongoDB database?

Memory and disk performance are key factors in the overall performance of MongoDB databases. If your working set is suitable for memory, MongoDB can avoid costly disk I/O operations, which can significantly improve performance. Likewise, effective disk I/O operations can improve the performance of write operations and ensure data integrity. Therefore, it is important to monitor and optimize memory and disk performance to ensure the best performance of MongoDB databases.

The above is the detailed content of How Memory & Disk Performance Affects Your MongoDB Database. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template