MongoDB + Redis 任务队列性能瓶颈

Question

问题背景： 近期在重构公司内部一个重要的任务系统，由于原来的任务系统使用了MongoDB来保存任务，客户端从MongoDB来取，至于为什么用MongoDB，是一个历史问题，也是因为如果使用到MongoDB的数组查询可以减少任务...

迷茫 · Answer

After some preliminary thinking, just for reference:

First of all, let’s mention the index. I believe you should add an index to this.
I have a question to confirm. The lock granularity in the latest version of mongodb is still at the database level. I don’t know which version you are using. It has not yet reached the lock table (Collection) granularity, so it is worse when the write concurrency is large, but it should be The performance isn't as bad as you described? I don’t understand. I suggest you consider the possibility of task sub-library?
Can you consider saving the status of subtasks and the status of main tasks separately? The status of subtasks can be placed in redis, and the main task is only responsible for its own status. In this way, the update frequency of each main task is reduced to 1/N, which can greatly reduce the pressure on the main task table in mongodb.
After the subtask is completed or times out, can we consider background asynchronous single-thread sequential synchronization of the main task status of mongodb?

阿神 · Answer

Personally, I think the performance issues of MongoDB array query and update mentioned by the questioner are likely to be issues with Schema design. But the questioner did not give a specific design, so I will put forward a few points worth paying attention to for reference only:

Index, as mentioned above, you should have indexed the array. However, it is worth noting that the index of an array field is much larger than the index of an ordinary field (depending on the size of the array, the larger the array, the larger the space occupied by the index). This may cause a problem: the index is not (completely) in memory! The consequence is that each query requires additional IO operations, and performance will drop sharply.
The query returns the size of the document. If the amount of document data returned for each query is large, and the client and mongodb are not on the same machine, it will increase the time required for network transmission (don’t underestimate this time), so try to only return all required fields.
update-in-place. Due to the schemaless feature, mongodb will reserve some space for each document record for use when adding additional fields or data, improving update performance. But if the size of your document frequently expands (adding fields, increasing array length, etc.), it will cause write performance problems: MongoDB needs to move the growing document to another place. (Equivalent to moving from one location on the hard disk to another more free location) The performance at this time will be greatly reduced.

Mongodb is an in-memory database. If all your hotspot data is in memory, its performance will be very excellent, and this largely depends on your Schema design.

PS: The Schemaless advantages that mongodb has always touted have misled many people. In fact, this is more to show that mongodb is a dynamic schema, rather than that it does not need to design a schema.

大家讲道理 · Answer

You can consider rabbitmq for task queue. In addition, mongodb shouldn’t be so slow, right? No indexing? Or try capped collection.