It’s a bit of a headline, but the following three tips are indeed more practical. The content comes from a sharing by Colin Howe, VP of Conversocial Company, in the London MongoDB User Group.
Application: The following points are not universally applicable. Whether they can be used depends on your own application scenarios and data characteristics.
We know that MongoDB is a document database, and each of its records is a document in JSON format. For example, as in the following example, one piece of statistical data will be generated every day:
{ metric: "content_count", client: 5, value: 51, date: ISODate("2012-04-01 13:00 ") }
{ metric: "content_count", client: 5, value: 49, date: ISODate("2012-04-02 13:00") }
And if a combination is used If it is a large document, you can save all one month's data into one record like this:
{ metric: "content_count", client: 5, month: "2012-04", 1: 51, 2 : 49, ... }
Through the above two methods of storage, a total of about 7GB of data is stored in advance (the machine only has 1.7GB of memory), and the test reads one year of information. The difference in reading performance between the two Obviously:
The first type: 1.6 seconds
The second type: 0.3 seconds
So where is the problem?
The actual reason is that combined storage can read fewer documents when reading data. If the document cannot be fully stored in the memory, the cost is mainly spent on disk seek. When obtaining one year of data, the first storage method requires more documents to be read, so the number of disk seek is also large. more. So slower.
In fact, foursquare, a well-known user of MongoDB, heavily uses this method to improve read performance. See this
We know that MongoDB, like traditional databases, uses B-trees as index data structures. For tree-shaped indexes, the more concentrated the storage of the index used to save hot data is, the smaller the memory wasted by the index. So we compare the following two index structures:
db.metrics.ensureIndex({ metric: 1, client: 1, date: 1})
and
db. metrics.ensureIndex({ date: 1, metric: 1, client: 1 })
Using these two different structures, the difference in insertion performance is also obvious.
When using the first structure, when the data volume is less than 20 million, the insertion speed can basically be maintained at 10k/s. When the data volume increases again, the insertion speed will slowly decrease to 2.5k/s. When the amount of data increases, its performance may be even lower.
When using the second structure, the insertion speed can be basically stable at 10k/s.
The reason is that the second structure puts the date field first in the index, so that when building the index, when new data updates the index, it is not updated in the middle, but only at the tail of the index. to modify. Indexes that are inserted too early require little modification during subsequent insert operations. In the first case, since the date field is not at the front, its index update often occurs in the middle of the tree structure, resulting in frequent large-scale changes in the index structure.
Same as point 1, this point is also based on the fact that the main operating time of traditional mechanical hard disks is spent on disk seek operations.
For example, taking the example in point 1, when we insert data, we insert all the space required for this year's data at once. This ensures that our data for 12 months of the year is in one record and is stored sequentially on the disk. Then when reading, we may only need one sequential read operation on the disk to read the year. For the data, compared with the previous 12 reads, the disk seek is only once.
db.metrics.insert([
{ metric: 'content_count', client: 3, date: '2012-01', 0: 0, 1: 0, 2: 0, ... }
{ ....................................., date: '2012 -02', ... })
{ ..................................... , date: '2012-03', ... })
{ ............................. ....., date: '2012-04', ... })
{ ......................... ..........., date: '2012-05', ... })
{ ..................... ............., date: '2012-06', ... })
{ ............ ....................., date: '2012-07', ... })
{ ......... ........................., date: '2012-08', ... })
{ .... ............................., date: '2012-09', ... })
{ ............................., date: '2012-10', ... })
{ .................................., date: '2012-11', . .. })
{ .................................., date: '2012 -12', ... })
])
Result:
If the reserved space method is not used, reading one year's records takes 62ms
If you use reserved space, it only takes 6.6ms to read one year's worth of records
The above is the detailed content of Solution to MongoDB disk IO problem. For more information, please follow other related articles on the PHP Chinese website!